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ABSTRACT 


This dissertation approaches the solution of optimization models with uncertain pa¬ 
rameters by considering the worst-case value of the uncertain parameters during opti¬ 
mization. We consider three problems resulting from this approach: a hnite minimax 
problem (FMX), a semi-inhnite minimax problem (SMX), and a semi-inhnite min- 
max-min problem (MXM). In all problems, we consider nonlinear functions with 
continuous variables. We find that smoothing algorithms for (FMX) may only have 
sublinear rates of convergence, but their complexity in the number of functions is 
competitive with other algorithms. We present two new smoothing algorithms with 
novel precision-adjustment schemes for (FMX). For (SMX) algorithms, we present a 
novel way of expressing rate of convergence in terms of computational work instead 
of the typical number of iterations, and show how the new way allows for a fairer 
comparison of different algorithms. We propose a new approach to solve (MXM), 
based on discretization and reformulation of (MXM) as a constrained hnite minimax 
problem. Our approach is the hrst to solve (MXM) in the general case where the in¬ 
nermost feasible region depends on the variables in the outer problems. We conduct 
numerical studies for all three problems. 
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EXECUTIVE SUMMARY 


Optimization problems with uncertain parameters arise in numerous applications. 
One possible approach to handle such problems is to consider the worst-case value of 
the uncertain parameter during optimization. We consider three problems resulting 
from this approach; a hnite minimax problem (FMX), a semi-inhnite minimax prob¬ 
lem (SMX), and a semi-inhnite min-max-min problem (MXM). In all problems, we 
consider nonlinear functions with continuous variables. We develop rate of conver¬ 
gence and complexity results, and propose algorithms for solving these optimization 
problems. 

In (FMX), we solve a minimax problem with a hnite number of variables and 
functions. We develop rate of convergence and complexity results of smoothing algo¬ 
rithms for solving (FMX) with many functions. We hnd that smoothing algorithms 
may only have sublinear rates of convergence, but their complexity in the number 
of functions is competitive with other algorithms due to small computational work 
per iteration. We present two smoothing algorithms with novel precision-adjustment 
schemes and carry out a comprehensive numerical comparison with other algorithms 
from the literature. We hnd that the proposed algorithms are competitive with SQP 
algorithms, and especially efficient for problem instances with many variables, or 
where a signihcant number of functions are nearly active at stationary points. 

The numerical results also indicate that smoothing with hrst-order gradient 
methods is likely the only viable approach to solve (FMX) with a large number of 
functions and variables due to memory issues. 

For (SMX), we solve a minimax problem with a hnite number of variables, 
but an inhnite number of functions. We develop and compare the rate of conver¬ 
gence results for various hxed and adaptive discretization algorithms, as well as an 
e-subgradient algorithm. We present a novel way of expressing rate of convergence, in 
terms of computational work instead of the typical number of iterations. Hence, we 
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are able to identify algorithms that are competitive due to low computational work per 
iteration even if they require many iterations. We show that a hxed discretization al¬ 
gorithm with a quadratically or linearly convergent algorithm map for the discretized 
problem can achieve the same asymptotic convergence rate attained by an adaptive 
discretization algorithm. We show that under certain convexity assumptions, the rates 
of convergence for discretization algorithms depend on the dimension of the uncertain 
parameters, while the rates of convergence for e-subgradient algorithms are indepen¬ 
dent of the dimension of the uncertain parameters under certain convexity-concavity 
assumptions. This indicates that under convexity-concavity assumptions, discretiza¬ 
tion algorithms are not competitive with e-subgradient algorithms for problems with 
large dimensions of the uncertain parameters, and that conclusion is validated by our 
numerical results. 

In (MXM), the variables in each layer of the problem vary within compact 
continuous sets. We consider two cases depending whether the inner feasible region 
is a constant set, which we denote by (SMXM), or depends on decision variables 
of the outer min-max problem, which we call the generalized semi-inhnite min-max- 
min problem, and denote by (GMXM). We propose a new approach to solve (MXM), 
based on discretization and reformulation of (MXM) into a constrained hnite minimax 
problem with a larger dimensionality than the original (MXM). Our approach is 
the hrst to solve (GMXM) in the literature and it also solves (SMXM). We apply 
our approach on a defender-attacker-defender network interdiction problem, which 
demonstrates the viability of the approach. 



I. INTRODUCTION 

A. MOTIVATION AND BACKGROUND 

Most, if not all, decisions in the real world are made under some uncertainty, 
for example, Apple needs to decide on the plant capacity for manufacturing the iPad 
before knowing the demands for it, the Department of Homeland Security needs to 
make investment and operational decisions not knowing where and how the next ter¬ 
rorist attack will occur, and almost everyone invests in stocks and bonds not knowing 
if they will turn out to be prohtable. 

In optimization models, uncertainty usually shows up as uncertain parameters 
in the model formulation. A common approach taken to handle the uncertainty is to 
use the average value of the parameter, or to use its most-likely value, and then use 
deterministic optimization to hud an optimal solution. There are many examples that 
show that optimal solutions based on such point estimates of uncertain parameters 
are not robust, i.e., small changes to the parameters cause the previously optimal 
solution to have a much worse outcome. 

The importance of considering uncertainty in optimization can be seen by the 
number of techniques developed for it (Sahinidis, 2004; Rockafellar, 2007), among 
which are the tools of stochastic programming (Shapiro, Dentcheva, & Ruszczyhski, 
2009) and stochastic dynamic programming (Powell, 2007). The main challenge with 
the technique is the availability or even the existence of the probability distribution 
for certain parameters. 

An example where no probability distribution exists is that of an adversar¬ 
ial situation, where an adversary wants to maximize damage to you, or minimize 
your ability to achieve certain objectives. In such problems it is reasonable to use a 
minimax formulation, to minimize the worst-case damage that can be caused by the 
adversary. Optimizing our actions against the worst-case scenario is the topic of this 
dissertation. 
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B. SCOPE OF DISSERTATION 

In this dissertation, we consider three problems of increasing difficulty: an 
unconstrained hnite minimax problem (FMX), an unconstrained semi-inhnite mini¬ 
max problem (SMX), and a constrained semi-inhnite min-max-min problem (MXM). 
Specihcally, we develop rate of convergence and complexity results, as well as algo¬ 
rithms for solving these problems. In all problems, we consider nonlinear functions 
with continuous variables. 

In (FMX), we solve a minimax problem with a hnite number of variables and 
functions. There are several approaches to solve (FMX). We consider one such ap¬ 
proach, that of smoothing algorithms. In smoothing algorithms (see for example 
Polak, Royset, & Womersley, 2003; Polak, Womersley, & Yin, 2008; Ye, Liu, Zhou, & 
Liu, 2008; Li, 1992; Xu, 2001), we create a smooth function that approximates the non- 
diherentiable pointwise maximum function and minimize the smooth approximating 
function. As noted in Polak et ah (2003), the key strength of smoothing algorithms 
is that they convert minimax problems into simple, smooth, and unconstrained opti¬ 
mization problems that can be solved using any standard unconstrained optimization 
algorithms. While complexity and rate of convergence have been studied extensively 
for nonlinear programs and minimax problems (see for example Nemirovski & Yudin, 
1983; Drezner, 1987; Wiest & Polak, 1991; Nesterov, 1995; Ariyawansa & Jiang, 2000; 
Nesterov & Vial, 2004; Nesterov, 2004), the topics have been largely overlooked in the 
specihc context of smoothing algorithms for (FMX). We discuss complexity and rate 
of convergence for smoothing algorithms for (FMX), and propose two new smooth¬ 
ing algorithms to solve (FMX). We consider problem instances of (FMX) with up to 
10,000,000 functions and up to 10,000 variables in the numerical studies to compare 
the new smoothing algorithms with other algorithms from the literature. 

For (SMX), we solve a minimax problem with a hnite number of variables, 
but an inhnite number of functions. The focus of our research for (SMX) is on a 
novel way of expressing rate of convergence of algorithms. Consider two (SMX) al- 
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gorithms, a linearly convergent algorithm and a superlinearly convergent algorithm. 
Since conventional rate of convergence do not consider computational work, it is pos¬ 
sible for the linearly convergent algorithm to generate an iterate every second, while 
the superlinearly convergent algorithm to generate an iterate every hour. Or worse 
still, the superlinearly convergent algorithm takes an hour to generate the hrst it¬ 
erate, and the run time to generate subsequent iterates doubles at every iteration. 
As mentioned, this lack of correlation between rate of convergence and run time is 
because conventional rate of convergence do not consider computational work. We 
propose a new way of expressing rate of convergence, which considers computational 
work. We select several (SMX) algorithms to illustrate how the new way of express¬ 
ing rate of convergence addresses the issues of the conventional way described above. 
Specihcally, we examine discretization and e-subgradient algorithms. Discretization 
algorithms are one of the more popular classes of algorithms for solving SIPs due to 
their simplicity. In discretization algorithms, we solve a sequence of hnite minimax 
problems, where the number of functions considered increases. Since the computa¬ 
tional work to solve a hnite minimax problem depends on the number of functions 
in the problem, a discretization algorithm takes increasingly longer time to gener¬ 
ate an iterate as the discretization algorithm progresses. An e-subgradient algorithm 
does not use discretization to solve (SMX) and is well-known to have a sublinear rate 
of convergence. Its run time does not vary much between iterations. Compared to 
the conventional way of expressing rate of convergence, we show that the new way 
allows us to conduct a fairer comparison between the e-subgradient algorithm and 
discretization algorithms. We also conduct numerical studies to validate the rate-of- 
convergence results that we obtain. 

In (MXM), the variables in each layer of the problem vary within a compact 
set with uncountable cardinality. We consider two cases depending whether the inner 
feasible region is a constant set, which we denote by (SMXM), or depends on decision 
variables of the outer min-max problem, which we call the generalized semi-inhnite 
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min-max-min problem, and denote by (GMXM). The problem (MXM) is difficult 
to solve, which explains the rather limited literature on (SMXM), and so far, there 
is no solution approach for (GMXM). We propose a new approach to solve (MXM), 
based on discretization and reformulation of (MXM) into a constrained hnite minimax 
problem. We apply the approach on a defender-attacker-defender network interdiction 
problem for a 10-node 18-arc network to demonstrate the viability of the approach. 

C. CONTRIBUTIONS 

The main contributions of this dissertation are as follows. We provide the 
hrst complexity and rate-of-convergence analyses of smoothing algorithms for solving 
(FMX). We develop two new smoothing algorithms with novel precision-adjustment 
schemes. We conduct a comprehensive numerical comparison of our algorithms with 
other algorithms from the literature, considering problem instances with the number 
of functions two orders of magnitude larger than problem instances considered in the 
literature. The numerical results indicate that the two new smoothing algorithms are 
competitive with the other algorithms compared. 

For (SMX), we present a novel way of expressing rate of convergence, in terms 
of computational work instead of the typical number of iterations, which allows for a 
fairer comparison of algorithms. We show that a hxed discretization algorithm with 
quadratically or linearly convergent algorithm map can achieve the same asymptotic 
convergence rate in terms of computational work as the one attained by an adaptive 
discretization algorithm. We show that under certain convexity-concavity assump¬ 
tions, discretization algorithms are not competitive with e-subgradient algorithms for 
problems with large dimension of the uncertain parameters, which we also validated 
in numerical tests. 
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We develop the first exact algorithm for (GMXM), which also results in a 
novel approach for solving (SMXM). If (MXM) has an objective function that is 
convex in the inner and outer minimization variables, and the inner and outer feasible 
regions are convex, then our algorithm guarantees convergence to a global minimizer 
of (MXM). 

D. MATHEMATICAL BACKGROUND 

This section dehnes notation and mathematical concepts used throughout this 
dissertation. Throughout the dissertation, M” denotes the n-dimensional Euclidean 
space, N = {1,2,...}, No = N U {0}, | ■ | represents the cardinality operator, || ■ || 
represents the Euclidean norm operator, denotes the transpose of the matrix A, 
and Xi —X represents that given a it' C N, for every e > 0, there exists an G iT 
such that \xi — x\ < e for all i > ii,i E K. Other than the above notation, which is 
used to denote the same quantities throughout this dissertation, some notation may 
be used to represent different quantities in different chapters. 

1. Continuity of Max Functions 

The following results on the continuity of the pointwise maximum (applies to 
minimum as well) function are used repeatedly throughout the dissertation. 

Proposition I.l. Suppose that the functions —)■ M, j G Q = {1,2, ...,g},g G 

N, are continuous for all x G d E N. Then the pointwise maximum function 
—)■ M defined by 

'iIj{x) = max f^(x) (I.l) 

j&Q 

is continuous for all x G □ 

Proposition 1.2. Let Y C be a compact set, and the functions ((){■, y), where 
(f -.Mf X —)■ M, be continuous for all y eY on Then the pointwise maximum 

function x M'" —)■ M defined by 

fj{x) = max(l){x,y) (1.2) 

y& 
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is continuous for all x □ 

The proofs for Propositions I.l and 1.2 can be found on pp. 51 and 187 of 
Demyanov and Malozamov (1974), repectively. 

2. Rate of Convergence 

Two key performance measures of an optimization algorithm are its complexity 
and rate of convergence. We dehne the different rates of convergence next, based on 
Bertsekas (1999, pp. 63-65) and Nocedal and Wright (2006, pp. 619-620). 

Consider a sequence of points ^ converging to x* G Rate of 

convergence can be evaluated using an error function —)■ M, where e„ > 0 for 

all n G No and ^ 0 as n — )■ cxd . The two common-used error functions are based 
on Euclidean distance 

Cn ll^n ^ 111 (^■^) 

and function values 

en=\f{Xn) - f{x*)\. (1.4) 

We say that the convergence is sublinear if 


Cn+l .. 

hmsup-= 1. 

n^oo 

The convergence is linear if there exist c G (0,1) and ni G No such that 

Cn+l , 

-< c. 


(1.5) 


( 1 . 6 ) 


for all n > Hi- Convergence is superlinear if 


limsup = 0. 

n^oo 


(1.7) 


We say that we have order of convergence r > 1 if there exist a c > 0 and a ni G No 
such that 

^n+l 


(Cn)’ 


< c. 


( 1 . 8 ) 


for all n > ni- When r = 2, we call the convergence quadratic. 
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If two sequences converge sublinearly, we say that they have the same rate, 
even if the constants are different. Similar comments hold for linear and superlinear 
convergence. We say that two sequences that are both superlinear converge at the 
same rate. We say that two sequences that are both superlinear but with different 
orders converge at the same rate but with different orders. We also say that superlin¬ 
ear convergence is faster than linear convergence, which again is faster than sublinear 
convergence. 

We say that an algorithm map used to solve a problem (P) converges sub- 
linearly, linearly, or superlinearly if the sequence generated by the algorithm map 
converges sublinearly, linearly, or superlinearly, respectively. 

3. Consistent Approximations 

This subsection discusses the theory of consistent approximations (Polak, 2003, 
1997). Consider the problem 

(P) min/(a;), (1.9) 

xGX 

where X C and / : —)■ M is continnous. 

Next, given TV G N, consider an approximate problem to (P) 

(Pv) fN{x), (1.10) 

xGXn 

where C and /tv : t M is continnous. 

Two properties are required for the approximating problems (as N — )■ cxd) to 
be consistent approximations to (P). First, we need the epi-convergence of (Ptv) to 
(P) as X —)■ cx). For a detailed discussion of epi-convergence; see Polak (1997, Section 
3.3.1) or Rockafellar and Wets (1998, Sections IB, 4B, & 7B). We here give essential 
definitions and results for our study. 

We dehne the epigraph 

E= {{x,z) e I a; e x,;^ > f{x)} . (I.ll) 
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The set E consists of all the points in on or above the function /(■). Similarly, 
the epigraph 

E^= {{x,z)eR^+^ \xeXM,z>fN{x)}. ( 1 . 12 ) 

Epi-convergence of (Ptv) to (P) as iV —)■ cx) is then dehned as set convergence of the 
epigraphs Ej^r to E, in the sense of Painleve-Kuratowski, as in Dehnition 5.3.6 of 
Polak (1997). For completeness, we restate the dehnition of set convergence in the 
sense of Painleve-Kuratowski. 


Definition I.l. Consider a sequence of sets C 

(i) We dehne the distance between a point x and a set At as 

p(a;, Ai) = inf {||a; — a;|| \ x E Ai} . (1-13) 

The point a; is a limit point of {AjiTg if p{x, A^) —)■ 0 as i —)■ cxd (that is, a; is a 
limit point of if there exist a a;* G A* for all i G N, such that Xi —)■ a;, as 

i —)■ cxd). 

(ii) The point a; is a cluster point of is ^ limit point of a subsequence 

of 

(iii) We denote the set of limit points of {Aj}^o ^>7 hminf A^, and we denote the set 

of cluster points of ^>7 limsupAj. 

(iv) The sets Ai converge in the sense of Painleve-Kuratowski to the set A as i ^ oo 

if lim inf A^ = lim sup A^ = A. □ 

An alternate way to prove epi-convergence is provided by the following propo¬ 
sition, extracted from Polak (2003, Theorem 3.1). 

Proposition 1.3. The sequence of problems {(Pv)} 7 VeN epi-converges to (P) as N ^ 
oo if and only if 

(i) for every x E X, there exists a sequence {a;Ar}ArgF}, where x^ G Xj^, xjsi ^ x as 
N —)■ oo, and lim sup /tv( a; at) < f{x); 

(ii) for every infinite sequence {xisi}n&k, where K d xn E X^ for all N E K, 

and xn — X as N ^ oo, then x E X and liminf fNix^) > f{x). □ 



The importance of epi-convergence is stated in the next result, extracted from 
Polak (2003, Theorem 3.2). 

Proposition 1.4. Suppose that the sequence of problems {(PAr)}Ar6N epi-converges to 
(P) as N ^ oo. Then the following facts hold: 

(i) If {xn} is a sequence of global minimizers o/(P 7 v) and there exists an infinite 
subset K E N such that xn x as N ^ oo, then x is a global minimizer of 
(P), and fNi^N) fix) as N ^ oo. 

(a) If {xn} is a sequence of local minimizers of (Pat) sharing a common radius of 
attraction p > 0 (i.e., for all N eN, /Ar(xAr) < fNix) for all x E Xj^ such that 
11 —a;AT 11 < p), and there exists an infinite subset K eN such that x^ x as 
N -E oo, then x is a local minimizer of (P), and fNixN) — fix) as N ^ oo. 

□ 

Epi-convergence does not rule out the possibility that an arbitrary sequence 
of local minimizers of (Ptv) may have an accumulation point that is neither a local 
minimizer nor a stationary point. To ensure that accumulation points of a sequence 
of stationary points of (Ptv) are stationary points of (P), a suitable characterization of 
stationarity is required, such as the use of optimality functions as defined by Definition 
3.3 of Polak (2003). 

Definition 1.2. A function 6^ : —)■ M is an optimality function for (P) if (i) 0(-) is 
upper semi-continuous, (ii) 6ix) <0 for all x E and (hi) if a; is a local minimizer 
of (P), then Six) = 0. Similarly, a function 6^ ^ is an optimality function for 

(Pat) if (i) 6*Af(-) is upper semi-continuous, (ii) O^ix) < 0 for all x E and (hi) if 
Xn is a local minimizer of (Pat), then ONixN) = 0. □ 

We next dehne consistent approximations, as per Dehnition 3.4 of Polak 

(2003). 

Definition 1.3. The pairs ((Pat), 6'Ar(-)); in fhe sequence {((Pat), 6*7v(-))} con¬ 
sistent approximations to the pair ((P),6'(-)) if (i) (Ptv) epi-converges to (P) as 
N ^ oo and (ii) for any infinite sequence {xN}NeK, X C N where xn —t x, 
limsup6'Ar(a;7v) < 6'(a;). □ 
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Consistent approximations ensnre that given a seqnence of approximate sta¬ 
tionary points {xat}, where 6n{xn) ^ 0 as iV —)■ cxd, and xn —)■ £ as iV —)■ cxd, then 
6{x) = 0, i.e., a; is a stationary point of (P). 

E. ORGANIZATION 

The remainder of this dissertation is ontlined as follows. Chapter II devel¬ 
ops resnlts for rate of convergence and complexity for smoothing algorithms to solve 
(FMX). We present two new smoothing algorithms with novel precision-adjnstment 
schemes and carry ont a comprehensive nnmerical comparison with other algorithms 
from the literature. In Chapter III, we present a novel way of expressing rate of con¬ 
vergence, in terms of computational work instead of the typical number of iterations. 
We develop and compare rate-of-convergence results for various hxed and adaptive 
discretization algorithms as well as an e-subgradient algorithm. In Chapter IV, we 
propose a new approach to solve (MXM). We apply the approach to solve a defender- 
attacker-defender network interdiction problem to illustrate the viability of our new 
approach. Chapter V covers the conclusions and future research opportunities. 
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II. 


FINITE MINIMAX PROBLEM 


A. INTRODUCTION 

This chapter considers hnite minimax problems of the form 

(FMX) (II.l) 

where —)■ M is dehned by 

= max/-^(a;), (II-2) 

j&Q 

and —)■ M, j G Q = {1,2, g G N, are twice continnously differentiable. 

(FMX) is “hnite” as we consider a hnite nnmber of fnnctions, as compared to the 
“semi-inhnite” problems (SMX) and (MXM) where we consider an inhnite nnmber 
of fnnctions. Finite minimax problems of the form (FMX) may occur in engineering 
design (Polak, 1987), control system design (Polak, Salcudean, & Mayne, 1987), port¬ 
folio optimization (Cai, Teo, Yang, & Zhou, 2000), best polynomial approximation 
(Demyanov & Malozemov, 1974), or as subproblems in semi-inhnite minimax algo¬ 
rithms (Panier & Tits, 1989). We focus on minimax problems with many functions, 
i.e., large q, which may result from hnely discretized semi-inhnite minimax problems 
or optimal control problems; see for example Panier and Tits (1989); Zhou and Tits 
(1996). We develop algorithms for such problems and analyze their efficiency. An 
abbreviated version of this chapter is published separately (Pee & Royset, 2010). 

The non-diherentiability of the objective function in (FMX) poses the main 
challenge for solving minimax problems, as standard unconstrained optimization al¬ 
gorithms do not apply directly. Many algorithms have been proposed to solve (FMX); 
see for example Zhou and Tits (1996); Polak et ah (2003); Obasanjo et ah (2010) and 
references therein. One approach is sequential quadratic programming (SQP), where 
(FMX) is hrst reformulated into the standard nonlinear constrained problem 

(FMX') min {z \ f^x) -z<0 Vj G Q} (II.3) 

(x,2:)eR‘*+l 
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and then an SQP algorithm is applied to (FMX'), advantageously exploiting the 
special structure in the new formulation (Zhou & Tits, 1996; Zhu, Cai, & Jian, 2009). 
Other approaches also based on (FMX') include interior point methods (Sturm & 
Zhang, 1995; Obasanjo et ah, 2010; Luksan, Matonoha, & Vlcek, 2005) and conjugate 
gradient methods in conjunction with exact penalties and smoothing (Ye et ah, 2008). 

Due to its aggressive active-set strategy, the SQP algorithm in Zhou and Tits 
(1996) appears especially promising for problems with many sequentially-related func¬ 
tions (in the sense that the values taken by /■'(■) are typically close to the values taken 
by /^+H')). as in the case of hnely discretized semi-inhnite minimax problems. The 
SQP algorithm in Zhou and Tits (1996) needs to solve two quadratic programs (QPs) 
in each iteration. Recently, Zhu et ah (2009) propose an SQP algorithm that requires 
the solution of only one QP per iteration, yet this algorithm retains global conver¬ 
gence and superlinear rate of convergence as in the algorithm in Zhou and Tits (1996). 
Furthermore, the algorithm in Zhu et ah (2009) does not use an active-set strategy. 
At a point x G M'^, we call a function G Q, active if f^{x) = t/’(a^), and e-active 

(e > 0) if f^{x) > 'ip{x) — e. In general, an active-set strategy only considers functions 
that are e-active (and disregards the other functions) at the current iterate, and thus 
greatly reduces the number of function and gradient evaluations at each iteration of 
an algorithm. While the number of iterations needed to solve a problem to required 
precision may increase, the overall effect may be a reduction in the number of func¬ 
tion and gradient evaluations, and that may translate into reduced computing times. 
For example, Polak et ah (2008) reports a 75% reduction in the number of gradient 
evaluations, and Zhou and Tits (1996) reports reductions in computing times with 
active-set strategies. 

In smoothing algorithms (see for example Polak et ah, 2003, 2008; Ye et ah, 
2008; Li, 1992; Xu, 2001), we create a smooth function (using exponential smoothing, 
to be discussed in Section II.B) that approximates the non-differentiable '^(•) and 
minimize the smooth approximating function. We refer to the resulting problem 
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of minimizing the smooth approximating function as a smoothed problem. As the 
smoothed problem remains unconstrained, one can use any standard unconstrained 
optimization algorithm, such as the Armijo Gradient or Newton methods (Polak et ah, 
2003) or a Quasi-Newton method (Polak et ah, 2008). 

A fundamental challenge for smoothing algorithms is that the smoothed prob¬ 
lem becomes increasingly ill-conditioned as the approximation becomes more accu¬ 
rate. Consequently, the use of smoothing techniques is complicated by the need to 
balance the accuracy of the approximation with problem ill-conditioning. The sim¬ 
plest smoothing algorithm creates an accurate smooth approximating function and 
solve it. This simple static scheme of constructing a single smoothed problem and 
solving it is highly sensitive to the choice of accuracy and has poor numerical perfor¬ 
mance (Polak et ah, 2003). An attempt to address this challenge by using a sequence 
of smoothed problems was hrst made in Xu (2001), where a precision parameter that 
controls approximation accuracy is initially set to a pre-selected value and then dou¬ 
bled at each iteration. Effectively, in this open-loop scheme to precision adjustment, 
the algorithm approximately solves a sequence of gradually more accurate approxi¬ 
mations. This open-loop scheme is sensitive to the multiplication factor (Polak et ah, 
2003). 

Polak et ah (2003) propose an adaptive precision-parameter adjustment scheme 
that controls problem ill-conditioning by keeping a smoothing precision parameter 
small when far from a stationary solution, and increasing the parameter as a sta¬ 
tionary solution is approached. Numerical results show that the scheme manages 
ill-conditioning better than static and open-loop schemes. The smoothing algorithms 
in Xu (2001) and Polak et al. (2003) do not incorporate any active-set strategy. 

Using the adaptive precision-parameter adjustment scheme in Polak et al. 
(2003), Polak et al. (2008) presents an active-set strategy for smoothing algorithms 
that tackles (FMX) with large q. We note that the convergence result in Theorem 
3.3 of Polak et al. (2008) may be slightly incorrect as it claims stationarity for all 
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accumulation points of a sequence constructed by the algorithm in Polak et ah (2008). 
However, the proof for Theorem 3.3 of Polak et al. (2008) relies on Polak et al. (2003), 
which guarantees stationarity for only a single accumulation point. 

This chapter examines smoothing algorithms for (FMX) with large q from two 
angles. First, we discuss complexity and rate of convergence for such algorithms. We 
dehne complexity as the computational work of an algorithm on a serial machine to 
obtain a solution that is within a specihed error tolerance of the optimal solution 
of a problem, expressed as a function of the sizes of a specihc set of inputs for the 
problem. While complexity and rate of convergence have been studied extensively 
for nonlinear programs and minimax problems (see for example Nemirovski & Yudin, 
1983; Drezner, 1987; Wiest & Polak, 1991; Nesterov, 1995; Ariyawansa & Jiang, 2000; 
Nesterov & Vial, 2004; Nesterov, 2004), the topics have been largely overlooked in the 
specihc context of smoothing algorithms for (FMX). A challenge here is the increasing 
ill-conditioning of the smoothed problem as the smoothing precision improves. We 
quantify the degree of ill-conditioning and use this result to analyze complexity and 
rate of convergence. We hnd that the rate of convergence may be sublinear, but 
low computational work per iteration yields complexity, as a function of g, that is 
competitive with several other algorithms. 

Second, we consider implementation and numerical performance of smooth¬ 
ing algorithms. A challenge here is to construct schemes for selecting the precision 
parameter that guarantee convergence to stationary points and perform well em¬ 
pirically. As discussed above, static and open-loop precision-parameter adjustment 
schemes result in poor numerical performance and, thus, we develop two adaptive 
schemes. In extensive tests against other algorithms, smoothing algorithms with the 
adaptive schemes are competitive, and especially so for problem with many variables, 
or where a signihcant number of functions are nearly active at stationary points. 
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B. EXPONENTIAL SMOOTHING 


For ease of analysis of active-set strategies, we consider the problem 

(FMX^) min^(a;), (II.4) 

where ipQ.{x) = maxj^Q (x), and C Q. When = Q, (FMXg) is identical to 
(FMX). For simplicity of notation, we drop subscripts Q in several contexts below. 
Next, for any Q O Q and for a parameter p > 0, we dehne a smoothed problem to 
(FMX^,) by 

(™X minjjpnix), (II.5) 

where 

^/jpn{x) = - log ( ^ exp {pP (x)) 

P \j&Q 

= +-log I ^exp (p(/-^(a;)j (II.6) 

^ Vieo / 

is an exponential penalty function. We denote (FlvIX^g) by (FMXp) for brevity. 
This smoothing technique was introduced in Kort and Bertsekas (1972) and used in 
Polak et ah (2003, 2008); Ye et ah (2008); Li (1992); Xu (2001). The exponential 
penalty function has been commonly used in smoothing algorithms as it preserves 
differentiability (as formalized in Proposition II.1) and convexity (Li & Fang, 1997). 

We denote the set of active functions at a; G by r2(a;) = {j G pfpx) = 
'ippx)}- Except as stated in Appendix A, we denote components of a vector by 
superscripts. 

The parameter p > 0 is a smoothing precision parameter, where a larger p 
implies higher precision as illustrated in Figure 1 and formalized by Proposition II.l; 
see for example Polak et ah (2008). In Figure 1, = {1,2,3} and the subscript “f2” 

has been dropped from the notation. The numbers in the subscripts are p values. 

Proposition II.l. Suppose that Q G Q and p > 0. 
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Figure 1. Smoothed Problems. 


(i) If the functions f^{-), j G hi, are continuous, then fjpni.') is continuous, and for 
any x G M.^, 'ippnix) decreases monotonically as p increases. 

(a) For any x G 


log|l](a;)| log|(]| 

0 < -< fjpn{x) - fJn{x) < - 


P 


P 


(11.7) 


where \ ■ \ represents the cardinality operator. 


(Hi) If the functions f^{-), j G hi, are continuously differentiable, then 'ippnf) is 
continuously differentiable, with gradient 

V^,o(x) = 5^/x^(a;)V/^(a;), (11.8) 

j&n 

where 

A exp(p/^(U) _ exp(p[/J(U - fc(U]) ^ gj 

^ exp(p/^(a;)) ^ exp(p[/^(a;) - fjnix)]) 

ken ken 


and Ppi^) ~ ^ ^ ^ 
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(iv) If the functions /■’(•)? j ^ ore twice continuously differentiable, then f’pnf) 
is twice continuously differentiable, with Hessian 




for all X e 


jen j&n 


■p 




-jeo 



(II.IO) 


□ 


We define a continuous, nonpositive optimality function ^ M for all 


a; G by 


On{x) = - min 

A^eSn 


- rix)) + I 

jeQ 


jen 


X] 


(II.ll) 


where Sq = {/i G | > 0 Vj G = !}■ The following optimality 

condition for (FMXj^) is expressed in terms of 0q{-)-, see Theorems 2.1.1, 2.1.3, and 
2.1.6 of Polak (1997). 


Proposition II.2. Suppose that the functions f^{-),j G Q, are continuously dif¬ 
ferentiable and that hi C Q. If x* E Mf is a local minimizer for (FMXn), then 
en{x*) = 0 . □ 


At stationary points of (FIvIXp^), the continuous, nonpositive optimality func¬ 
tion OpQ : —)■ M dehned by 9pn{x) = — ||| V'^pn(a;)|p for all x G M^, vanishes to 

zero. 


C. RATE OF CONVERGENCE AND COMPLEXITY 

This section examines the following basic smoothing algorithm, for which we 
develop a series of complexity and rate-of-convergence results. We use this simple 
algorithm to gain some fundamental insights on smoothing algorithms, but yet main¬ 
tain tractability of the analysis. When they exist, we denote optimal solutions of 
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(FMX) and (FMXp) by x* and x*, respectively, and the corresponding optimal val¬ 
ues by ^|J* and ip*. The algorithm applies the Armijo Gradient Method to (FMXp), 
starting at an initial point xq. The value of p is hxed at p* and it guarantees that 
Proposition II.3 stated below holds. The Armijo Gradient Method uses the steep¬ 
est descent search direction and the Armijo stepsize rule to solve an unconstrained 
problem; see for example Algorithm 1.3.3 of Polak (1997). 

Algorithm II. 1. Smoothing Armijo Gradient Algorithm 
Data: Error tolerance t > 0, Xq G 
Parameter: 5 G (0,1). 

Step 1. Set p* = (logg)/((l — 6)t). 

Step 2. Generate a sequence {xi}°Zo ^>7 applying Armijo Gradient Method to 

(FMXp*)- □ 

In this dissertation, we have several algorithms (including Algorithm II. 1) with 
no termination criteria stated in the algorithm procedure. In general for nonlinear 
programming, there are often more than one possible termination criterion for each 
algorithm. For example, a possible termination criterion for unconstrained nonlinear 
optimization is the norm of the search direction falls below a certain small number. 
Determining an appropriate criterion is often application dependent. In all our nu¬ 
merical studies, we terminate the algorithms when (i) the current iterate falls within 
a certain error tolerance of the optimal solution or objective function value, or (ii) the 
solution satishes the default tolerances of the solver used. We state the termination 
criterion in the numerical section of each chapter. 

Algorithm II. 1 has the following property. 

Proposition II.3. Suppose that q >2 and Step 2 of Algorithm II.1 has generated a 
point Xi G such that 'ipp*{xi) — tjj** < St. Then 'ip{xi) — fj* <t. 

Proof. By the optimality of ■^p* and (II.7), -ip*, < ipp*{x*) < ip* + {\ogq)/p*. Thus, 
—Ip* < —Ip** -I- (logg)/p*. Based on (II.7), ip{xi) < ipp*{xi) and hence, ip{xi) — ip* < 
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'^p* + (logg)/p*. Since < St andp* is as in Step 1, the conclusion 

follows. □ 

In the proposition above, the number of functions considered has been con¬ 
strained to be two or more {q > 2) as it is not meaningful to take the pointwise 
maximum of a single function. For a hxed p > 0, the rate of convergence of the 
Armijo Gradient Method as applied to (FMXp) is well known (see for example p. 60 
of Polak, 1997). However, the value of the precision parameter p* in Algorithm II. 1 is 
dictated by q and t (see Step 1), which complicates the analysis. For large values of q 
or small values of t, p* is large and hence (FMXp») may be ill-conditioned as observed 
empirically (Polak et ah, 2003). In this chapter, we quantify the ill-conditioning of 
(FMXp) as a function of p and obtain complexity and rate of convergence results for 
Algorithm II. 1. 

1. Ill-Conditioning of Smoothed Problem 

The following strong convexity assumption is a standard assumption required 
for complexity and rate of convergence analyses. 

Assumption II.4. The functions G N, are 

(i) twice continuously differentiable and 
(a) there exists an m > 0 such that 

m\\yf<{y,V^P{x)y), (11.12) 

for all x,y e and j G N. □ 

Lemma II.5. Suppose that Assumption II.4 holds. Then for any x,y E q E N, 
and p > 0, 

m\\yf < {y, V^fjp{x)y) , (11.13) 

with m as in Assumption II.4. 
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Proof. From (II. 10) and (11.12), we obtain that 


(y, VVp(a^)l/> = +p'^fj,^p{x){y,Vf^{x)Vf\xfy) 


p{ y, 


J2yi{x)vri 


X 


UeQ 




X 


UeQ 


(y,y^f (x)y} + p'^ /u^p(x) (y,Vf(x)y 


jeQ 


- piy. 


jeQ 




X) 


UeQ 


UeQ 


X 


> rn\\y\? + P^ Ppi.x) {y,VP{x)y -ply, 
jeQ \ 

Hence, we only need to show that the difference of the last two terms is nonnegative. 
Let —)■ M be the convex function dehned as g{z) = {y, for y,z E It 

follows from Jensen’s inequality (see for example p. 6 of Urruty & Baptiste, 1996) 
that 


^lxi{x)g{Vfnx))>g 


X 


(11,14) 




UeQ 


Since p > 0, the result follows. □ 

For any matrix A G we adopt the matrix norm ||H|| = max||u||=i ||Am||, 

where u E Under Assumption II.4(i), |/•^(a;)|, ||V/-^(a;)||, and ||V^p(a;)|| are 
bounded on bounded subsets of for given j E N. 

Assumption II.6. For any bounded set S C there exists a K E {0, oo) sueh that 
max{|p(a;)|, \\Vp{x)\\, \\Vpyx)\\} < K for all xeS,] eN. □ 


The assumption above holds for example under standard assumptions when 
/■^(•), j E N, arise from discretization of semi-inhnite max functions. Under this 
assumption, we obtain the following useful result. 


Lemma II.7. Suppose that Assumptions 11-4 (i) and II. 6 hold. Then for every bounded 
set S C 

{y,V^Mx)y) <pL\\yf (11.15) 
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for all X E S,y E q eN, and p > I, where L = K+2K‘^, with K as in Assumption 
II. 6. 


Proof. From the theory of matrix algebra, for matrices A E E and 

vector X E M”, we have that ||Aa;|| < ||A||||a;||, \\AB\\ < ||A||||i?||, and |la;a;^|| = ||a;||^ 
(see for example p. 26 of Gill, Murray, & Wright, 1991). We consider each of the 
three terms of see (II.10). Recall that J2jeQ ^ ^ ^ g G N, 

and p > 0. For any x E S, y E M'^, and q eN, under Assumption II.6, we obtain for 
the hrst term that 



< Ibll 


^/i^(a;)VV^(a^) ) V 

\j&Q 


< 




leg 


(11.16) 


where K is the constant in Assumption II.6 corresponding to S. Next, for the second 
term of V^'^p(-), 


p,5^/i^(a;)Vf(a;)Vf(a;)^l/) < ||p|| 


j&Q 


Y^^l(x)VP(x)VfH 


X 


JeQ 


\j&Q 

< K^yf. 


(11.17) 


For the third term, we obtain that 



^PPx)Vf(xj 

JeQ 


1 T 


T. 

.j&Q 




»Vf 


X) 


y 




- 

- 

T 

< hr 



j^4(x)vr(x) 




J&Q 

JeQ 



<K^yr 


(11.18) 


Hence, for all a; G S', p G W^, g G N and p > 1, {y,'V^ipp{x)y) < {K + pK'^ + 
pK^)\\yr<p{K + 2K^)\\yr. □ 


Lemma II.7 enables us to quantify the rate of convergence of the Armijo Gra¬ 
dient Method for (FMXp), as a function of p > 1, which we consider next. 
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Proposition II.8. Suppose that Assumptions II.4 and II.6 hold. For any bounded 
set S C there exists a k E {0, 1) such that the rate of convergence of the Armijo 
Gradient Method to solve (FMXp), initialized by xq E S, is linear with coefficient 
1 — k/p for any p > 1 and q E N. That is, for all sequences C generated 

by the Armijo Gradient Method when applied to (FMXp), for any p > I, q E N, and 
xo E S, we have that 




k 

< 1 -for alH G No- 

p 


(11.19) 


f’piXi) - % 

Proof. It follows by Lemma II.5 and Assumption II.6, and the fact that Xq G S, 
that there exists a bounded set S' C such that all sequences generated by Armijo 
Gradient Method on (FMXp), initialized by Xq G S, are contained in S' for all p > 1, 
q E N, Xq E S. Let m be as in Assumption 11.4 and K be the constant in Assumption 
11.6 corresponding to S'. In view of Lemmas 11.5 and 11.7, 


m\\yf < {y, V^fjp{x)y) < pL\\yf, (11.20) 

for all X E S', y E M., q E N, and p > 1, where L = K + 2K‘^. Hence, we deduce 
from Theorem 1.3.7 of Polak (1997) that the rate of convergence for Armijo Gradient 
Method to solve (FMXp) is linear with coefficient 1 — — a)/{pL) G (0,1) for 

all p > 1, g G N, Xq G S, where a,l3 E (0,1) are the Armijo line search parameters. 
Hence, 

k = 4:ml3a{l — a)/L, ( 11 - 21 ) 

which is less than unity because q;( 1 — a) G (0,1/4] and m < L in view of (11.20). □ 


2. Complexity 

The results above enable us to identify the complexity of Algorithm H.l un¬ 
der the following assumption on the computational work required for function and 
gradient evaluations. We let to = iP{xq) — ip* for a given xq E and q eN. 

Assumption II.9. There exist constants a,b E (0, cxd) such that for any d E N, 
j E N, and x G the computational work to evaluate either f^{x) or V f\x) is no 
larger than ad^. □ 
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Assumption 11.9 holds for all problem instances considered in this chapter (see 
Appendix A) and appears reasonable for many practical situations. The following 
result can easily be modihed to account for other assumption about work per function 
and gradient evaluation. 


Theorem II.10. Suppose that Assumptions II.4, 11-6, and II.9 hold, and that Algo¬ 
rithm II. 1 terminates after n iterations with 'ip{xn) —fj* <t. Then for any d eN and 
bounded set S C there exist constants c, c', t' G (0, cxd) such that the computational 
work until termination for Algorithm II. 1 is no larger than 

( 11 , 22 ) 

for all q Efl,q>2, Xq E S, 5 E (0,1), and t E (0, t']. 

Proof. Let q > 2 and t G (0,logg], which ensures that p* = (logg)/[(l — S)f\ > 1. 
Thus, Proposition II.8 applies and the number of iterations of the Armijo Gradient 
Method to generate {xi}f^Q such that 'ipp*{xn) — ipp* < St is no larger than 




(11.23) 


log(l - 

where k is the constant in Proposition II.8 corresponding to S and [•] denotes the 
ceiling operator. In view of Proposition II.3, Xn also satishes ipixn) — ip* < t. Since 
the main computational work in each iteration for the Armijo Gradient Method is to 
determine Vipp*{xi), it follows by Assumption II.9 that there exist a,b < oo such that 
the computational work in each iteration of the Armijo Gradient Method when applied 
to (FMXp*) is no larger than aqd^. Thus, the computational work in Algorithm II.1 
to termination at Xn is no larger than (11.23) multiplied by aqd^. Let f^* denote the 
minimum value of f^{-), which is hnite according to Assumption 11.4. Let K be the 
constant in Assumption 11.6 corresponding to S. We then hnd that to = ip{xo) —ip*^ 
K — f^* = c', for any Xq E S and q E N. It follows that the computational work in 
Algorithm 11.1 to termination at Xn is no larger than 

log^ 


aqd 


log(l - ^) 


(11.24) 
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for any q E N, q > 2, Xq E S, S E {0, 1), and t E (0,logg]. Since logo; < a; — 1 for 
X E (0,1], it follows by the choice of p* that the computational work in Algorithm 
II. 1 to termination at Xn is no larger than 


aq(f 


log 



St 

_c^_ 

k(l—5)t \ 

log*? J 


< aqd^ 


k(l—S)t 


log 9 


(11.25) 


for all q E N,q > 2, Xq E S, S E (0,1), and t E (0, min{log g, c'}]. 

There exists a t' E (0, min{log g, c'}] such that > \ for all t E (0,f'], 

g G N, g > 2, and 5 E (0,1). This then implies that for all g G N, g > 2, xq E S, 
S E (0,1), and t E (0, f], 


aqd!’ 


logglogfj 

k{l-6)t 


< 2aq(f 


logglogfjA 
k{l-6)t j 


2ad^ ( glogglogf^A 

k \ {l-5)t J 


(11.26) 


Since k (see (11.21)) only depends on m from Assumption II.4, K from Assumption 
II.6, and user-dehned parameters, the conclusion follows. □ 

We deduce from Theorem II. 10 and its proof that the number of iterations of 
Algorithm II. 1 required to achieve a solution with value within t of the optimal value 
of (FMX) is 0{{l/t) logl/t) for hxed g > 2, d G N, and 6 E (0,1). This is worse than 
for example the Pshenichnyi-Pironneau-Polak (PPP) min-max algorithm (Algorithm 
2.4.1 in Polak, 1997) and the modihed conjugate gradient method on pp. 282-283 of 
Nemirovski and Yudin (1983), , which achieves 0(logl/t). The SQP algorithm in 
Zhou and Tits (1996) may also require a low number of iterations as it converges 
superlinearly, but its complexity in t is unknown. The larger number of iterations for 
Algorithm II. 1 is caused by the fact that the Armijo Gradient Method exhibits slower 
rate of convergence as p increases (see Proposition II.8) and a larger p is required in 
Algorithm II. 1 for a smaller t. 

We next discuss the complexity of smoothing algorithms as compared to the 
SQP algorithms. We consider a sequence of hnite minimax problems with the same 
number of variables d, but with an increasing number of functions g. This occurs 
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for example, in the solution of semi-infinite minimax problems using discretization 
algorithms, which we discuss in Chapter III. 

When we also include the work per iteration of Algorithm II.1, we see from 
Theorem II.10 that for hxed t G (0,f'], d E N, and S G (0,1), the complexity is 
O(glogg). For comparison, the complexity of SQP and PPP algorithms to achieve a 
near-optimal solution of (FMX) is larger as we see next. 

The main computational work in an iteration of an SQP algorithm involves 
solving a convex QP with d + 1 variables and q inequality constraints (Zhou & Tits, 
1996). Introducing slack variables to convert into standard form, this subproblem 
becomes a convex QP with d + 1 + q variables and q equality constraints. Based 
on Monteiro and Adler (1989), the computational work to solve the converted QP is 
0{{d -|- 1 -|- q)^)- Assuming that the number of iterations an SQP algorithm needs to 
achieve a near-optimal solution of (FMX) is 0(1), for hxed t G (0,P] and d G N, the 
complexity of an SQP algorithm to achieve a near-optimal solution of (FMX) is no 
better than O(g^). The same result holds for the PPP algorithm. This complexity, 
when compared with 0{q log q) of Algorithm II. 1, indicates that smoothing algorithms 
may be more efficient than SQP and PPP algorithms for (FMX) with large q. We 
carry out a comprehensive numerical comparison of smoothing algorithms with SQP 
and PPP algorithms in Section IFF. We note that the modihed conjugate gradient 
method on pp. 282-283 of Nemirovski and Yudin (1983), may also have a low com¬ 
plexity in q, but this depends on its implementation and the method is only applicable 
to convex problems. 

3. Optimal Parameter Choice 

We see from Theorem II. 10 that the computational work in Algorithm II.1 
depends on the algorithm parameter S. In this subsection, we hnd an “optimal” 
choice of S. A direct minimization of (11.22) with respect to S appears difficult and 
thus, we carry out a rate analysis and determine an optimal S in that context. 

The notation t j, 0 means t approaches zero from above. We first consider the 
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situation as t 0 and let St G (0,1) be a choice of S in Algorithm II. 1 for a specihc 
t. For hxed d E N, q E N, q > 2, S C and Xq E S, let c and c' be as in Theorem 
II. 10 and let Wt denote (11.22) viewed as a function of t > 0, with S replaced by St, 


i.e., 


A. log 

Wt = C- 


with c = cq log q for all t > 0. The next result shows that the choice of {<5^ G (0,1) I ^ > 
0} influences the rate with which Wt ^ oo, as 11 0. However, any constant St for all 
t > 0 results in the slowest possible rate of increase in wt, an asymptotic rate of 1/t, 
as 110. 


Theorem II.11. For any G (0,1) \ t > 0}, 

log Wt 

hmsup -- < —1. 

40 log t 


If St = a E (0,1) for all t > 0, then 


logwt 

hm -- = — 1. 


(11.28) 


(11.29) 


40 log t 

Proof. There exists a G (0, cxd) such that log ^ > 1 for all t E (0, fi] and any 
{St G (0,1) I t > 0}. Hence, for any t E (0, min{l, ti}) and St E (0,1), 
log Wt 


logt 


log c ^ log log log(l - St) log t 


< 


logt 

logc 

logt 


logt 


logt 


logt 


(11.30) 


- 1 , 


and the hrst part follows. Taking limits in (11.30), with St = a, yields the second 
part. □ 

We next consider the situation as q ^ oo and, similar to above, let Sq E (0,1) 
be a choice of S in Algorithm H.l for a specihc g G N. For hxed d G N and S C 
let c and c' be as in Theorem 11.10. There exists a ti G (0, cxd) such that log(c/t) > 0 
and log(c'/t) > 1 for all t G (0,ti]. For any given g G N, g > 2 and t G (0,ti], let Wq 
denote (11.22) viewed as a function of g, with S replaced by Sq, i.e.. 


A /c^ glogglogj^ 


= U 


(11.31) 
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The next result shows that the choice of influences the rate with which 

Wq ^ oo, as q ^ oo. However, for sufficiently small tolerance t > 0, as above, any 
constant choice of Sq for all g G N results in the slowest possible rate of increase in 
Wq, as g — )■ cxD. Hence, any constant S G (0,1) in Algorithm H.l is optimal in this 
sense and results in the asymptotic rate of g, as g —)■ oo. 


Theorem 11.12. For any sequence o/{5q}^3, with Sq G (0,1), we have that 


logwg ^ ^ 
logg 


(11.32) 


for a// g G N, g > 3, and t G (0,fi]. If Sq = a, where a G (0,1) is a constant, then 

log Wq 


lim 

q^oo logg 


= 1 . 


(11.33) 


Proof. For g > 3, 

logWg ^ 1 ^ ^ 1^ ^ log logg ^ _ log(l - (^g) 2^, 

log g log g log g log g log g log g 

log f log log 

— ] - ^ -i- 

log g log g 


Since Wq is dehned only for t G (0,fi], and log(c/t) > 0 and log(c'/t) > 1 for all 
t G (0,ti], it follows that (log tCg)/log g > 1 for all g > 3, f G (0,fi], and {5g}“3. The 
proof for the second part follows from taking the limit in (11.34). □ 


4. Rate of Convergence 

The previous subsection considers the effect of the algorithm parameter <5 
on the computational work required in Algorithm H.l. This parameter dehnes the 
precision parameter through the relationship p* = (logg)/((l — S)t)-, see Step 1 of 
Algorithm H.l. In this subsection, we do not restrict Algorithm H.l to this class of 
choices for p* and consider any positive value of the precision parameter. In particular, 
we examine the progress made by Algorithm H.l after n iterations for different choices 
of p*. Since the choice may depend on n, we denote by Pn the precision parameter 
used in Algorithm H.l when terminated after n iterations. We examine the rate of 
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decay of an error bound on 'ip{xn) — ip*, and also determine the “optimal choice” of 
Pn that produces the fastest rate of decay of the error bound as n —)■ cxd. 

Suppose that Assumptions II.4 and II.6 hold. For a given bounded set S C 
let k be as in Proposition II.8 and let with xq G S', be a sequence generated by 

Algorithm II. 1 using p* = pn for some Pn > 0. Then in view of (II.7) and Proposition 
II.8, 


1p{Xn) - Ip* 


< 

< 

< 


P’pnM - 1p*p„ + 


logg 

Pn 





{iP{Xq) - Ip*) + 


21ogg 

Pn 


(11.35) 


We want to determine the “best” {pn}pp=i such that the error bound on ip{xn) — ip* 
dehned by the right-hand side of (11.35) decays as fast as possible as n —)■ cxd. We 
denote that error bound by e^, i.e., for any n G N, 





(i _ —Y + 

V Pn) Pn 


(11.36) 


We need the following trivial technical result. 


Lemma 11.13. For x G [0,1/2], —2x < log(l — x) < —x. □ 

We next obtain that asymptotically decays with a rate no faster than 1/n, 
as n —)■ cxD, regardless of the choice of pn, and that rate is attained with a particular 
choice of 


Theorem 11.14. The following statements hold for in (11.36): 

(i) For any with Pn > 1 for all n G N, liminf^^oo loge^/ logn > — 1. 

(a) If Pn = C^/logn for all n eN, with ( G (0, /c], then hm„^oo log e„/ log n = — 1. 

(Hi) If Pn = n^~^/\ogn for all n E N, with v G (0,1), then hm„^oo log e„/log n = 
-1 + z/. 
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Proof. For any n G N, we see from (11.36) that 

/ k 

log to + log I 1- 

V Pn 


log en = log l^exp 

> log ^max <|^exp 

= max I log ^exp 
Hence, for any n G N, n > 1, 
log e 


2 \ogq 
Pn 


log to + log (1 


V 


log to + n log ( 1 


V 


1--^1 

21ogg 

Pn) _ 

Pn 

k \] 

).log- 

1-- 
PnJ \ 


21ogg 

Pn 


logn 


> max 


log to pt) logp„ log 2 log log g 

1 ' ^ ? 1 ' ^ ^ 


logn 


logn 


log n log n log n 


(11.37) 


Let e > 0. Then there exists a no G N such that (loglogg)/logn > —e for all n > Uq. 
If (logp„)/logn < 1 and n > max{2,no}, then 


log Cr, 


> 


logPn log 2 log log g ^ 


logp^ 


e > -1 


e. 


, - , . , . , - , . _ ^ (11.38) 

log n log n log n log n log n 

Alternatively, suppose that (logp^)/logn > 1. Hence, n/pn < 1, and if n > 2k, then 
k/pn G (0,1/2]. Based on Lemma 11.13 and (11.37), 

k 


loge^ ^ log to ^ P„ 


^ log^o Pn 


> 


log to 2k 


(11.39) 

logn “ logn ' logn “ logn ' logn “ logn logn 
for all n > 2k such that (logp„)/logn > 1. Thus, there exists a ni > max{no,2A;} 

such that 

log to 


2k 


> -1-e 


(11.40) 


log n log n 

for all n > ni. Hence, for all n > ni, (log€„)/logn > —1 — e. Since e is chosen 
arbitrarily, the hrst part follows. Next, we prove the second part of the theorem. 
From (11.36), with pn = (n/ logn, where ( G (0, k], 


log en = log exp 


log to + n log ( 1 


/clogn 

Cn 


+ 


2 log g log n 
Cn 


(11.41) 


There exists a n 2 G N such that {k\ogn)/(n G [0,1/2] for all n > n 2 . Thus, by 
Lemma 11.13, 


log l^exp 
< log (^exp 


log to + n 
log to + n 


2k log n 
Cn 

k logn 
Cn 


2 log g log n 

^ C^^ 

2 log g log n 
Cn 


< log er, 


(11.42) 


29 



for all n > n 2 - We first consider the lower bound in (11.42), 


log ( exp 


\ogto + n 


2k logn 
(n 


+ 


2 log g logn' 


(n 


= log 


= log 


2 log q log n 

Cn 

' 2 log q log n 

Cn 


exp (log to + log n 


2 log q log n 
Cn 


+ 1 


+ log 


toCn 


1-- 


2 log q log n 


+ 1 


(11.43) 


Since ( ^ (0, k], and by continuity of the log(-) function, 


lim log 

n^oo 


toCn 


^ 


2 log q log n 



0 . 


Continuing from (11.43), and using (11.44), we obtain that 


iog(2i^)+iog(igy;+i) 

lim - - -- — 

n^oo log n 

log 2 + log log q + log log n — log ( — log n 

hm -^- 

n^oo log n 


(11.44) 


(11.45) 


Similar arguments yield that the upper bound in (11.42) also tends to —1, as n oo. 
Hence, the second conclusion follows. The third part of the theorem follows by similar 
arguments. □ 

We see from Theorem 11.14 that the “best” choice of pn is pn = (n/logn, 
with ( E (0, /c], and that choice results in an asymptotic rate of decay of error bound 
of 1/n. The constant k may be unknown as it depends on m of Assumption II.4 
and K of Assumption II.6; see (11.21). Consequently, = C^^/logn may be difficult 
to implement. Theorem 11.14 shows that the choice pn = / \ogn with a small 

v E (0,1) is almost as good (it results in asymptotic rate instead of rate 1/n) 

and is independent of k. 

Roughly speaking, a rate of decay of error bound of no better than 1/n in¬ 
dicated by Theorem 11.14 means that the required number of iterations to achieve 
an error tolerance t increases at least at rate 1/t as t approaches zero. In view of 
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Theorem II. 11, the rate 1/t is attained with the precision parameter choice in Step 
1 of Algorithm II. 1. Hence, the choice in Step 1 of Algorithm II. 1 for the precision 
parameter cannot be improved. 

Theorems II. 11 and 11.14 indicate that Algorithm II. 1 may only converge sub- 
linearly. In contrast. Theorem II.10 shows that smoothing algorithms may still be 
capable of yielding competitive run times against other algorithms when q is large due 
to low computational work per iteration. For smoothing algorithms to be competitive 
in empirical test, however, we need to go beyond the basic Algorithm II. 1 and develop 
more sophisticated, adaptive precision-adjustment schemes as discussed next. 

D. SMOOTHING ALGORITHMS AND ADAPTIVE PRE- 
GISION ADJUSTMENT 

The previous section shows that the choice of precision parameter influences 
the rate of convergence, since the degree of ill-conditioning in (FMXp) depends on 
the precision parameter. This section presents two smoothing algorithms with novel 
precision-adjustment schemes for (FMX). The results in Polak et ah (2003) and 
our preliminary numerical tests strongly indicate that adaptive precision-adjustment 
schemes are superior to static and open-loop schemes in their ability to avoid ill- 
conditioning. Thus, we focus on adaptive precision-adjustment schemes in our smooth¬ 
ing algorithms. 

The first algorithm. Algorithm II.2 follows Algorithm 3.2 in Polak et al. (2008), 
but uses a much simpler scheme for precision adjustment. The second algorithm. 
Algorithm II.3, adopts a novel line-search rule that aims to ensure descent in ip{-) 
and, if that is not possible, increases the precision parameter. Previous smoothing 
algorithms (Polak et ah, 2003, 2008) do not check for descent in '?/’(■)• The new 
algorithms implement active-set strategies adapted from Polak et al. (2008). 

We use the following notation. The e-active set, e > 0, is denoted by 

Q,{x) = {j e Ql'ipix) - f{x) < e}. (11.46) 
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As in Algorithm 3.2 of Polak et al. (2008), we compute a search direction using a 
d X d matrix Bp^{x). We consider two options. When 

Bpn{x) = J, (11.47) 

the d X d identity matrix, the search direction is equivalent to the steepest descent 
direction. When 

Bpn{x) = r]pn{x)I + Hpn{x), (11.48) 

the search direction is a Quasi-Newton direction, where 

yieo Vjeo / Vieo 

(11.49) 

ripn{x) = max{0, - epa{x)}, (11.50) 

9 ? > 0, and epQ_{x) is the smallest eigenvalue of Hpn{x). The quantity rjp^^x) ensures 
that Bp^{x) is positive dehnite. The Quasi-Newton direction given in (II.48)-(II.50) 
is adopted from Polak et al. (2008). Polak et al. (2008) observe that when p —)■ cxd, 
the hrst term in the Hessian function (II. 10) becomes negligible, thus they ignore the 
hrst term. 

We next present the two algorithms and proofs for their convergence. 

1. Smoothing Algorithm Based on Optimality Func¬ 
tion 

We hrst consider the following smoothing algorithm, with a simple adaptive 
precision-adjustment scheme. 

Algorithm II.2. 

Data: xq G 

Parameters and Auxiliary Functions: a,/? e (0,1),Po > = (101ogg)/po, 

function Bp^{-) as in (11.47) or (11.48), Cq > 0,.^ > 1,<^ > 1, p > 1. 



32 



Step 1. Set i = 0,j = 0,f2o = Qeo(^o)- 

Step 2. Compute the search direction hp.n^{xi) by solving 



Bp^n^{xi)hp^Q.[xi) — 'V'ipp^n^{xi). 

(11.51) 

Step 3. 

Compute the stepsize A* = where ki is the largest integer k such that 


+/d hp^Q^{Xi)) 'tppiVliiXi) < 0.(5 II^PiOi(2^i)ll 

(11.52) 

and 

^PiQiixi + /3^hp^n^{xi)) -^jJ{xi + /3^hp^n^{xi)) > -u. 

(11.53) 

Step 4. 

Set 



Xi+i =Xi + 

(11.54) 


ffj U ( 3 ^ 1 + 1 ). 

(11.55) 


Step 5. Enter Subroutine II. 1, and go to Step 2 on exit from Subroutine II. 1. 

Subroutine II. 1. Adaptive Precision-Parameter Adjustment using Optimality Func¬ 
tion 


If 

9p^nAxi+i) > -ei, (11.56) 

set X* = Xi+i, set pi+i = ^pi, set Cj+i = e*/^, replace f by f -f 1, replace j by j + 1, 
and exit Subroutine II. 1. 

Else, set pj+i = Pi, set e^+i = e*, replace f by i -|- 1, and exit Subroutine II.1. □ 

Steps 1 to 4 of Algorithm II.2 are adopted from Algorithm 3.2 of Polak 
et al. (2008). We note the unusual choice of the right-hand side in (11.52), where 
— ||hp.Q.(a;i)|p is used instead of the conventional (V'^p.Q.(a;i), hp.Q.(xj)). Test runs 
show that Algorithm II.2 with —\\hp.Q.{xi)\\‘^ is slightly more efficient than with the 
conventional (V'^p.Q.(a;i), hp.Q.(xj)). To allow direct comparison with Algorithm 3.2 
of Polak et al. (2008), we use —\\hp^fl^{xi)\\‘^ in Algorithm II.2. 
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The test in (11.53) prevents the construction of a point Xj+i where 
is much greater than 'ip{xi) during the early iterations when the set hi* is small; see 
Polak et ah (2008). 

The key difference between Algorithm II.2 and Algorithm 3.2 of Polak et al. 
(2008) is the simplihed scheme to adjust Pi in Subroutine II. 1. This difference calls 
for a different proof of convergence as compared to Polak et al. (2008), and will be 
based on consistent approximation; see Section I.D.3. Let V denote an increasing 
sequence of positive real numbers that approach inhnity. 

The following result shows that the pairs ((FMXpn), 6 'pq(-)) in the sequence 
{((FMXp^), 9pn{-))}p£r are indeed consistent approximations to ((FMXn), This 

is subsequently used in the proof of convergence of Algorithm II.2. 

Theorem 11.15. Suppose that Assumption II-4('^) holds. Then for any hi C N, 
the pairs ((FMXpn), 6 'pq(-)) in the sequenee {((FMMPpn), 6'pQ(-))}pep are eonsistent 
approximations to ((FMXq), 6*q(-)). 

Proof. We follow the proofs of Lemmas 4.3 and 4.4 in Polak (2003), but simplify 
the arguments as Polak (2003) deals with min-max-min problems. According to 
Theorem 3.3.2 of Polak (1997), (FMXp^n) epi-converges to (FMXn), as i —)■ cxd if 
and only if (i) for any x* G there exists a sequence with Xi G such 

that Xi —)■ X* and — )■ oo, as i — )■ oo, limsupj^oo'^p.Q(a;j) < 'ipn{x*) and (ii) for 
any sequence such that Xi G Xi ^ x* E M'^, and pi ^ oo as i ^ oo, 

liminii^oo'ippinixi) > i)n{x*). 

(i) Let X* G Construct a sequence {xjjjLg, where Xi = x* for all i. 

Obviously, Xi ^ x* as i ^ oo. According to Proposition ILl(ii), 'ippn{x) —)■ %Ijq{x), 
as p — )■ oo, this implies that limsupp^o^'^pn(a^*) = hminfp^oot/’po(a^*) = 'f’ni.x*). 
Therefore, hmsupi^^^/>Pio(a:i) = \irasupi^^fjp^n{x*) = fjn{x*). 

(ii) Let and {pi}°Zo be arbitrary sequences such that Xi ^ x*, x* E 

and pi ^ oo, as i ^ oo. For any t > 0, there exists by continuity of 

an io such that 'ipni.Xi) — 'ipn{x*) < | for all i > io- Moreover, from (IL7), there 
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exists an ii such that 'ipp^Q,{x) — | for all i > ii. Then for all 

i > m&yi{io,il),ilp^n{xi) - i)n{x*) = iipin{xi) - i)n{xi) +i)n{xi) - i)n{x*) < t. Hence, 

i^Pin{xi) iin{x*). 

We next consider the optimality functions. Let C and {pi}'^Q,Pi > 

0 for all i, be arbitrary sequences and x* E be such that Xi —)■ x* and Pi ^ oo, 
as i —)■ cxD. Since p^{x) E (0,1) for any j G hi, p > 0, and x E ^ 

bounded sequence in and, according to the Bolzano-Weierstrass Theorem, there 
exists at least one convergent subsequence. For every such subsequence K C Nq, there 
exists a poo G such that Pp^Xi) — Poo, as i —)■ cxd. Moreover, since poo E Sn, 
Xljeo f^oo — 1- 

If j ^ r2(a;*), then there exist a t > 0 and io ^ hi such that f^{xi)—'ipfi{xi) < —t 
for all i > Iq. Hence, from (H.9), p^.{xi) —)■ 0, as i —)■ cxd, and therefore = 0. By 
continuity of Vp (-), j E H, 

jeo 

as i —)■ cxD. Since poo E Sq and /i^, = 0 for all j ^ we hnd in view of (11.11) 

that 

0^n(x-) = - fHx')) -\\\Y.liL'^P(x-)f < 0„(x-). ( 11 , 58 ) 

jeo ieo 

This completes the proof. □ 

The next result is identical to Lemma 3.1 in Polak et al. (2008). 

Lemma 11.16. Suppose that <Z is a sequence constructed by Algorithm 

II. 2. Then there exists an i* E No and a set 12* C Q such that the working sets 12* 
satisfy 12 * = 12 * for all i > i*. 

Proof. By construction, 12* C 12*+i for all i E Nq. Since the set Q is hnite, the lemma 
must be true. □ 

The following result ensures convergence of Algorithm If.2. 
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Theorem 11.17. Suppose that Assumption IL4(i) holds. Then any accumulation 
point X* G of a sequence {x*}fSQ C constructed by Algorithm II.2 satisfies the 
first-order optimality condition 9{x*) = 0. 

Proof. Let Q* O Q and i* G No be as in Lemma 11.16, where 12* = 12* for all 
i > i* . As Algorithm 11.2 has the form of Master Algorithm Model 3.3.12 in Polak 
(1997) for all i > i*, we conclude based on Theorem 3.3.13 of Polak (1997) that any 
accumulation point x* of a sequence {x*}flQ constructed by Algorithm 11.2 satisfies 
9n*{x*) = 0. The assumptions required to invoke Theorem 3.3.13 in Polak (1997): 

(i) Continuity of'^n*('))l(’pO*(')) and 6'pQ*(-), p > 0, which follows by Assump¬ 

tion IL4(i), Proposition ILl(i), Theorem 2.1.6 of Polak (1997), and Proposition 
ILl(iii), respectively. 

(ii) The pairs ((FMXp^*), 6'pn‘(-)) in the sequence {((FMXp^*), 6'^^* (•))}p 69 [:> are con¬ 
sistent approximations to ((FMX^^* ))( •)), which follows by Theorem 11.15. 

(hi) If Steps 1 to 4 of Algorithm 11.2 are applied repeatedly to (FMXp^*) with a fixed 
p > 0, then every accumulation point a; of a sequence {xk}^!^ constructed must 
be a stationary point of (FMXp^*); i-e., 0pn*{x) = 0, which follows by Theorem 
3.2 in Polak et al. (2008). 

Since 0q*{x*) = 0, from (11.11), there exists a p G such that 

2 

jeo* ieo* 

Let TT G Sq, Tft = 0 for j G Q — f2*, and = pt for j G 12*. Thus, it follows from 
(11.11) that 

2 

9{x*) > — 7r-^('^(a;*) — f\x*)) — | (x*) = 0. (11.60) 

jeQ j&Q 

Since 9{-) is a nonpositive function, the result follows. □ 

2. Smoothing Algorithm Using Cost Descent 

Next, we consider the second smoothing algorithm, which determines the step- 
size based on the actual function '^(•) rather than the smoothed function '^po(-). 
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Algorithm II.3. 

Data: xq E 

Parameters and Auxiliary Functions: a,/3 E (0,1), function Bp^{-) as in (11.47) 
or (11.48), e > 0, 9 ? > l,Po > 1,P 3> Po, 3> 1, C > 1) 7 > 0, e (0,1), Ap > 1. 

Step 0. Set f = 0, flo = Qei^o), k_i = 0. 

Step 1. Compute Bp.^^{xi) and its largest eigenvalue cr^^^(xi). If 

compute the search direction 


hp^nX^i) = -V'lpp.n.ixi). 

Else, compute the search direction hp^Q^{xi) by solving the equation 

Bp.^^{xi)hp^nAxi) = 


(11.62) 


(11.63) 


Step 2a. Compute a tentative Armijo stepsize based on working set hlj, starting 
from the eventual stepsize of the previous iterate /cj-i, i.e., determine 


\iaAxi) = max 

s.t. Api^Axi + A^hp.nAxi)) - Api^Axi) < aAAVApinA^i), hp^nAxi))- (11.64) 


Set 


Vi = Xi + A^hp^^A^i). 


(11.65) 


Step 2b. Forward track from Ui along direction hp^^Axi) as long as A{') continues 
to decrease using the following subroutine. 

Substep 0. Set /' = /, 


Ziv = Xi + A^'hp^nA^i) and ZiP-i = Xi + A^' ^hp^nAxi)- 


( 11 . 66 ) 


Substep 1. If 


A{zii>-i) < AiziiA, 


(11.67) 
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replace /' by /' — 1, set zu'^i = x* + ^3’‘'~^hp^Q^{xi), and repeat Substep 1. 
Else, set Zi = zw. 

Substep 2. If Pi < p, go to Step 3. Else, go to Step 4. 

Step 3. If 

ip{zi) - p;{xi) < —( 11 . 68 ) 

set Xi+i = Zi,pi+i = Pi, ki = I', set flj+i = fij U Qe{xi+i), replace f by f + 1, and go to 
Step 1. 

Else, replace Pi by ^Pi, replace flj by fl* U Q^{zi), and go to Step 1. 

Step 4. If (11.68) holds, set Xj+i = Zi, ki = I', set pi+i = Pi + Ap, set flj+i = 
rij U Qe{xi+i), replace f by i + 1, and go to Step 1. 

Else, set Xi+i = Pi, ki = /, set pj+i = Pi + Ap, set flj+i = flj U Q^{xi+i), replace i by 
i + 1, and go to Step 1. □ 

As is standard in stabilized Newton methods (see for example Section 1.4.4 
of Polak, 1997), Algorithm II.3 switches to the steepest descent direction if i?po(') is 
given by (11.48) and the largest eigenvalue of i?po(-) is large; see Step 1. Compared 
to Algorithm 3.2 in Polak et al. (2008), which increases p when || V'^p.Oi(a^i)ll i® small. 
Algorithm II.3 increases the precision parameter only when it does not produce suffi¬ 
cient descent in '^(•), as verihed by the test (11.68) in Steps 3 and 4 of Algorithm II.3. 
A small precision parameter may produce an ascent direction in '^(•) due to the poor 
accuracy of 'ipp^nP-). Thus, insufficient descent is a signal that the precision param¬ 
eter may be too small. All existing smoothing algorithms only ensure that (•) 
decreases at each iteration, but do not ensure descent in '^(•). Another change com¬ 
pared to Polak et al. (2003, 2008) relates to the line search. All smoothing algorithms 
are susceptible to ill-conditioning and small stepsizes. To counteract this difficulty. 
Algorithm II.3 moves forward along the search direction starting from the Armijo 
step, and stops when the next step is not a descent step in '^(•); see Step 2b. 

Algorithm II.3 has two rules for increasing pi. In the early stages of the 
calculations, i.e., when pi < p, if sufficient descent in '^(•) is achieved when moving 
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from Xi to Zi ((11.68) satisfied), then Algorithm II.3 sets the next iterate Xj+i to Zi, 
retain the current value of the precision parameter as progress is made towards the 
optimal solution of (FMX). However, if (11.68) fails, then there is insufficient descent 
and the precision parameter or the working set needs to be modihed to generate a 
better search direction in the next iteration. In late stages of the calculations, i.e.. 
Pi > p, Algorithm II.3 accepts every new point generated, even those with insufficient 
descent, and increases the precision parameter with a constant value. 

The next lemma is similar to Lemma 11.16. 

Lemma 11.18. Suppose that <Z is a sequence constructed by Algorithm 

11.3. Then there exists an i* G Nq and a set Q* G Q such that the working sets 
satisfy fli = and = 'ip{xi) for all i > i*. 

Proof. The first part of the proof follows exactly from the proof for Lemma 11.16. 
Next, since Q{xi) C for all i; see Steps 3 and 4 of Algorithm 11.3, t/’o*(a^i) = t/’(a^i) 
for alH > i*. □ 

Lemma 11.19. Suppose that Assumption II.4(i) holds, and that the sequences C 

and {pi}°Zo T M are generated by Algorithm II.3. Then the following properties 
hold: (i) the sequence monotonically increasing; (ii) if the sequence 

has an accumulation point, then p* —)■ cx) as i ^ oo, and ^/Pi ~ +oo. 

Proof. We follow the framework of the proof for Lemma 3.1 of Polak et ah (2003). 
(i) The precision parameter is adjusted in Steps 3 and 4 of Algorithm 11.3. In Step 
3, if (11.68) is satisfied, then pj+i = pp, if (11.68) fails, p* is replaced by fpi > Pi. In 
Step 4, pi+i = Pi + Ap > Pi + 1 > Pi. 

(ii) Suppose that Algorithm 11.3 generates the sequence {xijjTQ with accumu¬ 
lation point X* G but {pijjLg is bounded from above. The existence of an upper 
bound on pj implies that p* < p for all i G Ng, because if not. Algorithm 11.3 will 
enter Step 4 the first time at some iteration G Ng, and re-enter Step 4 for all i > i', 
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and Pi —)■ cxD as i —)■ cxD. Thus, the existence of an upper bound on pi implies that 
Algorithm II.3 must never enter Step 4. 

The existence of an upper bound on pi also implies that there exists an iteration 
i* G No such that (11.68) is satished for all i > i*, because if not, Pi will be replaced by 
^Pi repeatedly, and pj ^ cxd as i —)■ cx). This means that < —j/pi^ for 

all i > i*. Since Pi < p for all i G No, —oo as i —)■ cxd. However, by continuity 

of '?/’(•)) X* being an accumulation point, 'ip{xi)^^'ip{x*), where A' C No is some 
inhnite subset. This is a contradiction, so Pi oo. 

Next, we prove that ~ +oo. Since pi oo, there exist an iteration 

i* G No such that pi > p for all i > i*. This means that the precision parameter is 
adjusted by the rule pj+i = Pi + Ap for all i > i*. The proof is complete by the fact 
that = oo. □ 

Lemma 11.20. Suppose that Assumption II.4('i) holds. Then for every bounded set 
S' C and parameters a,/3 E (0,1), there exist a K < oo sueh that, for all p > I, 
Q C Q, and x E S, 

ippn{x + XpQ{x)hpQ{x)) - ippn{x) < ^ (11.69) 

where Xpn{x) is the stepsize defined by (Il.Of) and hpQ,{x) is the search direction as 
defined by (11.62) or (11.63), with Pi replaced by p, replaced by and Xi replaced 
by X. 

Proof. If hpQ^x) is given by (11.63) with Bp^{x) as in (11.47), then the result follows 
by the same arguments as in the proof for Lemma 3.2 of Polak et al. (2003). If 
hpn{x) is given by (11.63) with BpQ_{x) as in (11.48), then the result follows by similar 
arguments as in the proof for Lemma 3.4 of Polak et al. (2003), but the argument 
deviates to account for (i) the lower bound on the eigenvalues of Bp^{x) takes on the 
specihc value of 1 in Algorithm 11.3, and (ii) we consider an arbitrary Q O Q. 

Based on Assumption IL4(i), (11.8), and the assumption that S' is a bounded 
set, there exists a constant M < oo such that |lV'^pQ(a;)|| < M, for all p > 1, H C Q, 
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and X ^ S. Let 

Sb= {x e M'^|||a;-a;'|| < M,x’ e S} , (11.70) 

and L < cxD be the constant corresponding to such that (11.15) holds for all 
a; G S's, 1 / G and p > 1. For any real d x d matrix A, let ||yl|| denote its induced 
matrix norm as dehned on p. 20. If A is symmetric, 

Pll = (11.71) 

whenever > 0, where is the largest eigenvalue of A] see for example p. 3 of 
Lang (2000). Now, suppose that p > I, ^ ^ Q, and x E S are such that hpQ,{x) = 
— BpQ^{x)~^V'iljpQ,{x). Since all induced norms are consistent by dehnition, ||hpQ(a;)|| < 
l|Bpn(i)"'lll|V^,n(i)||. 

By construction, BpQ{x) is symmetric and positive dehnite as the minimum 
eigenvalue of Bp^{x) is 1, because (p > 1, and based on (11.50). Thus, BpQ,{x)~^ 
is symmetric and positive dehnite; see for example Bertsekas, Nedic, and Ozdaglar 
(2003, p. 16). Hence, using the fact that the eigenvalues of an inverse matrix are the 
reciprocals of the eigenvalues of the original matrix, and (11.71), 

IIV(2^)ll < l|B,«(2^)^‘lll|VVvn(i^)ll = . (11.72) 

and 

(VV.,„(:r),V(x))<-!!^(^, (IL73) 

^pQ. 

where (j'^^{x) and are the smallest and largest eigenvalues of Bp^i^x) respec¬ 

tively. From Step 1 of Algorithm 11.3, we see that the direction in (11.63) is selected 
only when cr^^ix) < k, and by construction according to (11.50), cT™(a;) > 1. Hence, 
from (11.72) and (H.73), 

IIV(i^)ll < l|V<fpn(i)|l (11.74) 

and 

(V^pn(x), hpn{x)) < (11.75) 

tv 
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It follows directly from (11.62) that (11.74) and (11.75) also hold when hpfi{x) = 

-V^/’po(a;). 

Next, for all A G (0,1],a; G S and p > I, using the Mean-Value Theorem 
(see for example Section 5.1.28 of Polak, 1997) and Lemma II.7, we have for some 
s G [0,1], 


i^pnix + \hpn{x)) - i)pn{x) - a\(^^ppn{x), hpn{x)) 

= A(1 - a){V'4)pn{x), hp^{x)) + ^A^ {hpQ{x), {x + s\hp^{x)) hpQ{x)) 


< A(1 - a){Vi)pn{x), hpn{x)) + \\^pL\\hpn{x 


< -A||V^,o(a;)|p 


1 — a 


K 


\\pL 


Let 


A 


A* = min < 1, 


2(1-a) 
pLk 


(11.76) 


(11.77) 


Then it follows from (11.77) that, for every A G (0, A*], we have 


'ippuix -h \hpn{x)) - 'ippnix) - aA(V^/'po(a;), hpn{x)) < 0. (11.78) 

Hence, by (11.78) and the stepsize rule in (11.64), 


\pn{x) > /3\* 


(11.79) 


for all p > 1, n C Q, and x E S. Consequently, by (11.64) and (11.79), we have that 


i^pnix + Xpnix)hpn{x)) - ^/jpn{x) 
< -a\pn{x){V'ippn{x),hpn{x)) 


< —a min < I3, 


2/3(1 -a) \ WV-ippnix 


pLn 


K 


for all p > 1, n C Q, and x E S. Hence, the conclusion follows with 

2/3(1-a) 


K = min < /3, 


Lk 


This completes the proof. 


(11.80) 


(11.81) 

□ 


42 



Lemma 11.21. Suppose that Assumption IL4(i) holds and that <Z MA is a 

bounded sequence generated by Algorithm II. 3. Let 12* C Q and i* G No be as in 
Lemma 11.18, where 12^ = 12* for all i > i*. Then there exists an accumulation point 
X* E of the sequence such that 9n*{x*) = 0. 

Proof. For the sake of contradiction, we assume that there exist a p > 0 such that 

liminf > p. (11.82) 

2^00 

Since is a bounded sequence, it has at least one accumulation point according 

to the Bolzano-Weierstrass Theorem. Hence, by Lemma 11.19, pi — )■ oo, as i —)■ cxd. 
Consider two cases, Xi+i = Pi or Xj+i = Zi in Algorithm II.3. 

If Xj+i = Pi, by Lemma 11.20, there exists an M < oo such that 

^ ) (11.83) 

Pi 

for i > i*. Hence, 


{Xi+i) - {Xi) = (Xi+i) - f)p^u- {Xi+i) + f)p^n* (a^i+i) - {xi) 


< 


aM||V^p^n.(a;,)||- 

Pi 


(11.84) 


for i > i*, where we have used the fact from Proposition H.l(i) that 


f/’pi+iO* (2^1+1) ^ f/’piO* (2^1+1)) 


(11.85) 


for i > i*, because p^+i > Pi from Lemma 11.19. 

Next, if Xj+i = Zi, then (11.68) is satisfied. It follows from (H.7) and Lemma 
11.18 that. 


i’pi+iu* (a^i+i) - ippiU* {xi) < ipn* {xi+i) + ^ {xi) 

Pi+l 

= fj{Xi+i) + ^ - fj{Xi) 

Pi+l 

^ ^ log 1^1 

“ pA Pi 
_ -7 + pA~^log|12*| 

Pf 


( 11 . 86 ) 
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From (11.84) and (11.86), for all i > i*, 


'ippi+iQ^^i+i) - i^Pin*{xi) < max 




Pi 


Pi 


(11.87) 

By Proposition Il.l(iii), ||V'^p.o»(a;i)|| is bounded because is bounded. Since 

u G (0,1), there exists an i** G Ng, where i** > i*, such that 


aM\\Vijp^n*{xi)f ^ -'f + Pi" Mog|ff 


Pi 


Pt 


for all i > i**. Therefore, from (11.87), 

i^Pi+in*{xi+i) -%in*{xi) < 


aM\\V'ipp^n*{xi)f 

Pi 


( 11 . 88 ) 


(11.89) 


for all i > i**. Since by Lemma 11.19, ~ +oo, it follows from (11.84) and 

(11.89) that 

i^Pin^ixi) -oo, as i ^ oo. (11.90) 


Let X* be an accumulation point of {xjj^g. That is, there exists an inhnite subset 
77 C No such that Xi^^x*. Based on (IL7), Lemma 11.19, and continuity of 
it follows that 'ippin*{xi)^^'ipn*{x*), as i —)■ cxd, which contradicts (11.90). Hence, 
liminfj^oo ||V'^p.Q»(a;i)|| = 0. Consequently, there exists an inhnite subset K* C Ng 
and an x* G such that Xi x* and 9p.n*{xi) 0, as i ^ cxd, which implies 

that limsupj^oo 6*p.Q»(a;i) > 0. From Dehnition 1.3, Theorem 11.15, and the fact that 
0 Q *(-) is a nonpositive function, 6q,*{x*) =0. □ 


Theorem 11.22. Suppose that Assumption II.4(i) holds, (i) If Algorithm II.3 con¬ 
structs a bounded sequence {xjj^g C then there exists an accumulation point 
X* G of the sequence {xjj^g that satisfies 9{x*) = 0. (ii) If Algorithm II.3 con¬ 
structs a finite sequence {a;i})lg C then Step 2h constructs an unbounded infinite 
sequence with 




(11.91) 


for all I' G {1,1 — 1,1 — 2,...}, where I is the tentative Armijo stepsize computed in 
Step 2a. 
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Proof. First, we consider (i). Let the set C Q be as in Lemma 11.18, where 
= 11* for all i > i*. Based on Lemma 11.21, there exists an accumulation point of 
the sequence x* G such that 9q*{x*) = 0. The conclusion then follows by 

similar arguments as in Theorem 11.17. 

We next consider (ii). Algorithm 11.3 constructs a hnite sequence only if it jams 
in Step 2b. Then Substep 1 constructs an inhnite sequence satisfying (11.91) 

for all /' G {1,1 — 1,1 — 2,...}. The inhnite sequence is unbounded since hp^Q^{xi) ^ 0 
as (11.91) cannot hold otherwise, and /d G (0,1). □ 

3. Complexity 

Next, we consider the complexity in q for a hxed d G N of Algorithms 11.2 and 
11.3 to achieve a near-optimal solution of (FMX). Suppose that all functions are 
active, i.e., flj = Q, near an optimal solution. If i?po(‘) is given by (11.47), then the 
main computational work in each iteration of Algorithms 11.2 and 11.3 is the calcu¬ 
lation of V'^p(-), which takes 0(g) arithmetic operations under Assumption 11.9; see 
the proof of Theorem 11.10. If BpQ{-) is given by (11.48), then the main computational 
work is the calculation of (11.48) and hpn{x). Under Assumption 11.9, it takes 0(g) 
arithmetic operations to compute fi^p(x), for all j G Q, 0{q) to compute Vf^{x), for 
all i G Q, 0(g) to sum f^{x)Vf^{xY, 0(g) to sum Ejsq 

and the other operations take 0(1). In all, the number of arithmetic operations to 
obtain Bp^{x) is 0(g). A direct method for solving a linear system of equations to 
compute hpQ,{x) depends on d, but is constant in g. Hence, if Bp^{-) is given by (11.48), 
the computational work in each iteration of Algorithms 11.2 and 11.3 is 0(g). It is 
unclear how many iterations Algorithms 11.2 and 11.3 would need to achieve a near- 
optimal solution as a function of g. However, since they may utilize Quasi-Newton 
search directions and adaptive precision adjustment, there is reason to believe that 
the number of iterations will be no larger than that of Algorithm H.l, which uses the 
steepest descent direction and a hxed precision parameter. Thus, suppose that for 
some tolerance t > 0, the number of iterations of Algorithms H.2 and H.3 to generate 
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{2^i}r=o; with the last iterate satisfying 'ip{xn) — ip* < t, is no larger than O(logg), 
as is the case for Algorithm ILL Then the complexity of Algorithms II.2 and II.3 to 
generate Xn is no larger than O(glogg), which is the same as for Algorithm II.1. 

E. NUMERICAL RESULTS 

We present an empirical comparison of Algorithms II.2 and II.3 with algo¬ 
rithms from the literature over a set of problem instances from Polak et ah (2003); 
Zhou and Tits (1996) as well as randomly generated instances; see Appendix A. This 
study appears to be the hrst systematic comparison of smoothing and SQP algorithms 
for large-scale problems, with number of functions q up to two orders of magnitude 
larger than previously reported. Specihcally, we examine: 

(i) PPP. Pshenichnyi-Pironneau-Polak min-max algorithm (Algorithm 2.4.1 in Po¬ 
lak 1997). 

(ii) e-PPP. An active-set version of PPP as stated in Algorithm 2.4.34 in Polak 
(1997); see also Polak (2008). 

(hi) SQP-2QP. Algorithm 2.1 of Zhou and Tits (1996), an SQP algorithm with two 
QPs. 

(iv) SQP-IQP. Algorithm A in Zhu et al. (2009), a one-QP SQP algorithm. 

(v) SMQN. Algorithm 3.2 in Polak et al. (2008), a smoothing Quasi-Newton algo¬ 
rithm. 

(vi) Algorithms II.2 and II.3 of the present chapter. 

We refer to Appendix B for details about algorithm parameters. With the ex¬ 
ception of PPP and SQP-IQP, the above algorithms incorporate active-set strategies 
and, hence, appear especially promising for solving problem instances with large q. 
We implement and run all algorithms in MATLAB version 7.7.0 (R2008b) (see Math- 
works 2009) on a 3.73 GHz PC using Windows XP SP3, with 3 GB of RAM. All QPs 
are solved using TOMLAB CPLEX version 7.0 (R7.0.0) (see Tomlab 2009) with the 
Primal Simplex option, which preliminary studies indicate result in the smallest QP 
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run time. We also examined the LSSOL QP solver (see Gill, Hammarling, Murray, 
Saunders, & Wright, 1986), but its run times appear inferior to that of CPLEX for 
large-scale QPs arising in the present context. 

Algorithm 2.1 of Zhou and Tits (1996) is implemented in the solver CFSQP 
(Lawrence, Zhon, & Tits, 1997) and we have verihed that onr MATLAB implementa¬ 
tion of that algorithm produces comparable results in terms of number of iterations 
and run time as CFSQP. We do not directly compare with CFSQP as we hnd it more 
valuable to compare different algorithms using the same implementation environment 
(MATLAB) and the same QP solver (CPLEX). 

For Algorithm 11.3, nnless otherwise stated, we use the Quasi-Newton direction 
with BpQ{x) as dehned in (11.48), because preliminary test runs show that generally, 
the alternate steepest descent direction with Bp^{x) as dehned in (11.47) prodnces 
longer rnn times. We examine all problem instances from Polak et al. (2003); Zhou 
and Tits (1996) except two that cannot be easily extended to large q. As the problem 
instances with many variables in Polak et al. (2003); Zhou and Tits (1996) do not 
allow us to adjust the number of functions, we create two additional sets of problem 
instances; see Appendix A for details. We report rnn times to achieve a solution x 
that satishes 

^/J{x) - < t, (11.92) 

where is a target value (see Table 17 of Appendix A) equal to the optimal 

value (if known) or a slightly adjusted value from the optimal values reported in 
Polak et al. (2003); Zhou and Tits (1996) for smaller q. We use t = 10“®. Although 
this termination criteria is not possible for real-world problems, we hnd that it is the 
most usefnl criterion in this stndy. 

Before we can compare the rnn times of the various algorithms, we need to 
conduct sensitivity analysis to determine a robust setting (one that produces the 
fastest run times for majority of the problem instances) for the parameter e (see 
(11.46)) to use for the active-set strategies. 
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1. Selection of a Robust e for Active-Set Algorithms 

Of the algorithms compared, e-PPP, SQP-2QP, SMQN, Algorithms II.2 and 
II.3 implement some form of active-set strategies. The performance of these active-set 
algorithms depend on the parameter e, which dehnes an e-active set at each iteration. 
However, as e is not used exactly the same way in the different algorithms, we do not 
expect the robust e setting to be the similar for the different algorithms. 

For the sensitivity analysis, we use the same set of problem instances ProbC, 
ProbG, and ProbL (see Appendix A) for all active-set algorithms. The three problem 
instances have different problem dimensionality d, which we hope contribute a robust 
setting for e. We include the non-convex ProbG (see Table 17 of Appendix A) to 
ensure that the chosen e is robust for both convex and non-convex problems. 

The number of objective functions for each test problem, q is set as high as 
possible (in powers of 10), without encountering memory problems for any of the 
algorithms. For each problem instance and active-set algorithm, we determine the 
run times with e = 1000,100,..., 10“^*^. We present a representative sample of the run 
times, leaving out (i) those run times that do not change much when we decrease e by 
a factor of 10, and (ii) those run times that are signihcantly longer than the fastest 
run time. 

a. Selection of a Robust e for e-PPP 

Table 1 indicates that the performance of the algorithm e-PPP is sen¬ 
sitive to e, and there is no single value of e that is consistently better for the three 
problem instances considered. The word “local” indicates that the algorithm con¬ 
verges to a locally optimal solution for the non-convex ProbG. The run times with 
e = 10“^ to e = 10“^ seem to be consistently better than other settings, and we will 
use e = 10“^ for the algorithm comparison study. 

b. Selection of a Robust e for SQP-2QP 

Table 2 indicates that the performance of the algorithm SQP-2QP is 
relatively insensitive to different e values. We use e = 1 for the algorithm comparison. 
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Probe 

ProbG 

ProbL 

d 

2 

4 

4 X 102 

Q 

103 

103 

102 

e = 1000 

869.7 

local 

315.1 

e = 100 

540.5 

local 

243.3 

e = 10 

350.9 

local 

190.9 

e = 1 

71.7 

local 

140.8 

O 

1 

35.5 

local 

101.0 

e = 10-2 

5.1 

local 

79.2 

e = 10-3 

5.0 

local 

79.7 

e = 10-4 

3.1 

local 

104.9 

e = 10-3 

3.1 

local 

197.1 

o 

1 

o 

31.5 

local 

4246 

e = 10-13 

> 7200 

local 

> 7200 

O 

1 

to 

o 

> 7200 

local 

> 7200 


Table 1. Run times based on e for e-PPP. The word “local” means that the algorithm 
converges to a locally optimal solution that does not satisfy (11.92), which may occur 
for non-convex problems. 



Probe 

ProbG 

ProbL 

d 

2 

4 

4 X 102 

Q 

103 

103 

102 

e = 1000 

1.7 

2.7 

21.5 

e = 100 

0.85 

2.4 

21.4 

e = 10 

0.74 

2.5 

21.4 

e = 1 

0.67 

2.4 

15.1 

O 

1 

0.71 

2.5 

15.0 

e = 10-3 

0.76 

3.2 

14.3 

e = 10-40 

0.72 

3.2 

14.2 

e = 10-43 

0.76 

3.2 

14.4 

e = 10-20 

0.68 

3.1 

14.3 


Table 2. Run times based on e for SQP-2QP. 
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as it provides consistently fast run times as seen in Table 2, and it is also the proposed 
value in Zhou and Tits (1996). 

c. Selection of a Robust e for SMQN and Algorithm II.2 

Algorithm II.2 is very similar to SMQN, the only difference being the 
schemes for precision-parameter adjustment. Due to their similarity, we conduct the 
sensitivity analysis with only SMQN, but apply the resulting e to both algorithms for 
the algorithm comparison. 



Probe 

ProbG 

ProbL 

d 

2 

4 

4 X 10^ 

Q 

105 

105 

10 ^ 

e = 1000 

152.6 

105.2 

584.8 

e = 100 

152.5 

105.5 

571.3 

e = 10 

153.0 

103.6 

845.3 

e = 1 

140.0 

116.5 

547.8 

O 

1 

112.0 

108.2 

153.2 

e = 10-5 

83.9 

216.3 

113.9 

e = 10-^° 

11.8 

31.2 

113.9 

e = 10-^5 

12.2 

29.8 

114.1 

e = 10-^" 

12.6 

25.3 

114.0 


Table 3. Run times based on e for SMQN and Algorithm II.2. 


Table 3 provides a clear indication that a small e provides the fastest 
run times for SMQN consistently. There is no recommended setting for the parameter 
e in Polak et al. (2008). We select e = 10“^° for SMQN and Algorithm II.2 for the 
algorithm comparison. 

d. Selection of a Robust e for Algorithm II.3 

Table 4 indicates that the performance of Algorithm II.3 is sensitive to 
the value of e and there is not a single e value that is optimal for the three problem 
instances selected. Similar to SMQN and Algorithm II.2, we use e = 10“^° for the 
algorithm comparison. 
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Probe 

ProbG 

ProbL 

d 

2 

4 

4 X 10^ 

q 

105 

105 

O 

1 

e = 1000 

5.4 

local 

0.34 

e = 100 

5.4 

local 

0.35 

e = 10 

3.7 

local 

0.34 

e = 1 

4.3 

local 

0.77 

O 

1 

3.0 

local 

3.4 

e = 10-5 

3.5 

557.4 

4.3 

e = 10-^" 

0.96 

27.6 

4.2 

e = 10-^5 

1.2 

22.3 

4.1 

e = 10-^" 

1.3 

20.1 

4.6 


Table 4. Run times based on e for Algorithm II.3. The word “local” means that the 
algorithm converges to a locally optimal solution that does not satisfy (11.92), which 
may occur for non-convex problems. 


In view of the above sensitivity analyses, we use the following values of 
e to compare the various algorithms in the next section, e = 10“^ for e-PPP, e = 1 
for SQP-2QP, and e = 10“^° for SMQN, Algorithms II.2 and II.3. 

2. Comparison 

In this subsection, we compare the algorithms over a set of problem instances 
from Polak et ah (2003); Zhou and Tits (1996) as well as randomly generated in¬ 
stances; see Appendix A. 

a. Minimizing the Maximum of up to 100,000 Functions 

Table 5 summarizes the run times (in seconds) of the various algorithms, 
with Columns 2 and 3 giving the number of variables d and functions q, respectively. 
Run times in boldface indicate that the particular algorithm has the shortest run 
time for the specihc problem instance. The numerical results in Table 5 indicate that 
in most problem instances, the run times are shortest for SQP-2QP or Algorithm 
II.3. Table 5 indicates that SQP-2QP is significantly more efficient than SQP-IQP for 
problem instances ProbA-ProbG. This is dne to the efficiency of the active-set strategy 


51 




in SQP-2QP, which is absent in SQP-IQP. However, for ProbJ-ProbM, SQP-IQP is 
comparable to SQP-2QP. This is because at the optimal solution of ProbJ-ProbM, 
all the functions are active. This causes the active-set strategy in SQP-2QP to lose 
its effectiveness as the optimal solution is approached. 

Table 5 indicates also that Algorithm II.2 is more efficient than SMQN 
for most problem instances. As the only difference between the two algorithms lies 
in their precision-parameter adjustment scheme, this highlights the sensitivity in the 
performance of smoothing algorithms to the control of their precision parameters. 
Table 5 also shows that Algorithm II.3 is more efficient than Algorithm II.2 and 
SMQN for most problem instances. 

Table 5 indicates that SQP-2QP is generally more efficient than Al¬ 
gorithm II.3 for problem instances with small dimensionality, d < 4 (specihcally 
ProbA-ProbG), and vice versa. This is consistent with the common observation that 
SQP-type algorithms may be inefficient for problems with many variables; see for 
example Zhou and Tits (1996). 

Table 5 shows that some algorithms return locally optimal solutions for 
some problem instances (labeled “local” in Table 5). In view of these results, there is 
an indication that smoothing algorithms (SMQN, Algorithms II.2 and II.3) tend to 
hnd global minima more frequently than PPP and SQP algorithms. 
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b. Minimizing the Maximum of up to 1,000,000 Func¬ 
tions 

Table 6 presents similar results as in Table 5, but for larger q. We do 
not present results for PPP and SQP-IQP as the required QPs exceed the memory 
limit. The comprehensive sensitivity studies for e show signihcant improvement for 
Algorithm II.3 for ProbJ-ProbM if a large e is used. Hence, we include the results for 
Algorithm II.3 with e = 1000 in Table 6. This e-value means that there is effectively 
no active-set strategy. Sensitivity tests conducted for the other algorithms with a 
larger e show no improvement in their run times. 

The observations from Table 6 are similar to those for Table 5. Table 
6 indicates that Algorithm II.3 with e = 1000 is efficient for ProbJ-ProbM, which 
has large d and a signihcant number of functions active at the optimal solution. For 
completeness, the run times for Algorithm II.3 with e = 1000 for ProbJ-ProbM in 
Table 5 are 2.8, 14.3, 0.36 and 3.0 seconds respectively, while the run times for the 
other problem instances are longer than Algorithm II.3 with e = 10“^°. 

The results in Tables 5 and 6 indicate that among the algorithms con¬ 
sidered, SQP-2QP and Algorithm II.3 are the most efficient algorithms for minimax 
problems with a large number of functions. The run times for ProbJ-ProbM indi¬ 
cate that SQP-2QP is less efficient for problem instances with a signihcant number 
of the functions that is nearly active at the solution, as the active-set strategy loses 
its ehectiveness. 
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c. 


Randomly Generated Problem Instances 


The problem instances from the literature examined in Tables 5 and 
6 include either cases with few functions e-active at an optimal solution (ProbA- 
Probl) or cases with all functions e-active (ProbJ-ProbM). We also examine randomly 
generated problem instances with an intermediate number of functions e-active at 
the optimal solution; see ProbN in Table 17 of Appendix A. The optimal values are 
unknown in this case but the target values given in Table 17 of Appendix A appear 
to be close to the global minima. 


d 

Q 

SQP-2QP 
(6 = 1) 

Algo II.3 SD 
(e = 1000) 

Algo II.3 QN 
(e = 1000) 

10 

10,000 

0.42 

0.64 

0.62 

100 

10,000 

0.82 

0.48 

0.54 

1,000 

10,000 

124.9 

0.38 

4.8 

10 

100,000 

4.1 

3.8 

4.2 

100 

100,000 

11.5 

3.8 

4.1 

1,000 

100,000 

mem 

4.3 

9.7 

1,000 

1,000,000 

mem 

37.2 

42.5 

1,000 

10,000,000 

mem 

421.8 

492.5 

10,000 

100,000 

mem 

6.3 

mem 


Table 7. Run times (in seconds) of algorithms on problem instance ProbN. “SD” and 
“QN” indicate that Algorithm II.3 uses i?po(-) given by (11.47) and (11.48), respec¬ 
tively. The word “mem” indicates that the algorithm terminates due to insufficient 
memory. 


Table 7 presents the run times for Algorithm II.3 and SQP-2QP on 
ProbN. As the problem instances are relatively well-conditioned, Algorithm II.3 with 
Bpn{-) given by (11.47), i.e., a steepest descent (SD) direction, may perform well and 
is included in the table. The parameter e for Algorithm II.3 is set to 1000 for this 
set of problem instances, as preliminary test runs show that it is consistently better 
than other choices. Table 7 indicates that SQP-2QP is less efficient than Algorithm 
II.3 for problem instances with large d, and where there is a signihcant number of 
functions e-active at the optimal solution. The last row in Table 7 shows that for 
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problem instances with d > 10,000, the storage of the d x d -ffpo(-) matrix for both 
SQP-2QP and Algorithm II.3, with i?po(-) given by (11.48), causes both algorithms 
to terminate due to memory limitations. Thus, Algorithm II.3, with i?po(') given by 
(11.47), which does not have any matrix to store, may be a reasonable alternative 
when d is large. 

F. CONCLUSIONS FOR FINITE MINIMAX 

This chapter focuses on minimizing the maximum of many functions and 
presents complexity and rate-of-convergence analysis of smoothing algorithms for such 
problems. We hnd that smoothing algorithms might only have sublinear rates of con¬ 
vergence, but their complexity in the number of functions is competitive with other 
algorithms due to small computational work per iteration. We present two smoothing 
algorithms with novel precision-adjustment schemes and carry out a comprehensive 
numerical comparison with other algorithms from the literature. We hnd that the 
proposed algorithms are more efficient than a recent smoothing algorithm from the 
literature, due to the more efficient precision-adjustment schemes implemented. The 
proposed algorithms are competitive with SQP algorithms, and especially efficient for 
problem instances with many variables, or where a signihcant number of functions 
are nearly active at stationary points. The numerical results indicate that smoothing 
with hrst-order gradient methods is likely the only viable approach to solve hnite 
minimax problems with many functions and variables due to memory issues. 
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III. 


SEMI-INFINITE MINIMAX PROBLEM 


A. INTRODUCTION 

In this chapter, we consider semi-inhnite minimax problems of the form 

(SMX) (III.l) 

a;eIR‘* 

where —)■ M is dehned by 

= max0(a;,|/), (III-2) 

yeY 

F is a compact inhnite subset of 0 : x —)■ M, d, m G N, and 0(-, •) is 

continuous and sufficiently smooth on x F as specihed below. Note that if Y is 
a hnite set instead, we have the hnite minimax problem (FMX) from Chapter II. 
The notation in each chapter is self-contained. Hence, we here as well as below reuse 
some symbols from Chapter II in dehnitions of new quantities. The data m, i.e., the 
dimension of y, is as we see below a key quantity and we refer to as the uncertainty 
dimension. 

In general, (SMX) is used by decision makers to determine the optimal re¬ 
sponse to the worst-case scenario. (SMX) arises in applications such as hnance 
(Rustem & Howe, 2002), electrical circuit theory (Demyanov & Malozemov, 1974), 
and policy optimization (Becker, Dwolatzky, Karakitsos, & Rustem, 1986). Solving 
(SMX) is difficult for two reasons: (i) for any x G Mt^,'ip{x) may not be computable 
in hnite time because of the global maximization involved, and (ii) ip{-) may not be 
differentiable everywhere. 

Several methods have been proposed to solve (SMX); see Rustem and Howe 
(2002, Chapter 2) for a survey of semi-inhnite minimax algorithms. A key method 
for solving (SMX) is the use of semi-inhnite programming (SIP) methods. (SMX) 
can be reformulated into the SIP 


(SMX') min {z \ (j){x,y) — z <0 \/y eY}, 

(x,2:)eK‘*+l 


(IIL3) 
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involving an infinite number of constraints, which can then be solved by any SIP algo¬ 
rithm. SIPs are usually solved by solving a sequence of hnite problems, i.e., problems 
with a hnite number of constraints. Depending on how the hnite problems are cre¬ 
ated, we can generally group SIP algorithms into three classes: exchange algorithms, 
local reduction algorithms, and discretization algorithms; see Lopez and Still (2007); 
Hettich and Kortanek (1993); Reemtsen and Corner (1998) for surveys on the theory, 
applications and algorithms of SIP. 

In exchange algorithms (Kortanek & No, 1993), at each iterate {xi,Zi) G 
i e N, new constraints Vi) — z < 0 corresponding to a maximizer iji G 
argmaXj^gy ^(a;*, y) are added to the hnite problem, and existing constraints removed, 
i.e., an exchange of constraints occurs. In local reduction algorithms (Price & Coope, 
1990), under certain regularity assumptions, the SIP can be converted locally into a 
hnite problem. 

Discretization algorithms are one of the more popular classes of algorithms 
for solving SIPs due to their simplicity. They create hnite problems by considering 
a hnite discretized subset of Y. To achieve the required solution tolerance, most 
discretization algorithms implement some kind of adaptive discretization rehnement 
rule to gradually increase the level of discretization, rather than hx the discretization 
at a high level right from the start. In this chapter, we refer to those algorithms 
that are applied to solve the individual discretized problems as algorithm maps, to 
diherentiate them from the overall discretization algorithm that usually includes some 
adaptive discretization rehnement rule. At each stage of the algorithm, the level 
of discretization is hxed and an algorithm map is used to solve the hnite problem 
approximately. The approximate solution is then usually used to warm-start the 
next stage. We refer to Hettich (1986); Reemtsen (1991); Polak and He (1992); Polak 
(1997) for examples of discretization algorithms. 

There are also algorithms that directly address (SMX) without the reformu¬ 
lation to SIP. The algorithms in Chaney (1982); Klessig and Polak (1973) assume 
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that the maximum in (III.2) occurs at a unique point y{x), for all X e which en¬ 
sures that V’(‘) is differentiable. Smooth optimization methods, such as the method 
of centers algorithm in Chaney (1982) and a hrst-order, feasible directions method 
in Klessig and Polak (1973), are then used to minimize the smooth (j){-,y{-)). The 
conceptual algorithm in Panin (1981) and an implementable version in Kiwiel (1987) 
use a convex piecewise linear approximation of '^(•) to solve (SMX). As (SMX) be¬ 
longs to the general class of nonsmooth problems, nonsmooth optimization algorithms 
such as subgradient and bundle algorithms (Rnstem & Howe, 2002) can be used as 
well. Subgradient algorithms determine the descent direction by computing at least 
one snbgradient at each iterate, while bundle algorithms use subgradient information 
over several snccessive iterates to determine the descent direction. A discretization 
algorithm that does not involve the reformulation into a SIP is proposed in Demyanov 
and Malozemov (1971). The algorithm solves an inhnite sequence of finite minimax 
problems of the form 

min max 0(a;, y), (III-4) 

xSK'* ySVjv 

where Yn, N eN, are finite discretized subsets of Y. This approach is fundamentally 
the same as converting (SMX) into a SIP and then applying discretization methods. 

In this chapter, we propose a novel way of expressing rate of convergence, in 
terms of computational work instead of the typical number of iterations. We hrst 
discuss the inadequacy of the typical rate of convergence. We consider two adaptive 
discretization algorithms (Polak & He, 1992; Polak, Mayne, & Higgins, 1992) to solve 
(SMX). Polak and He (1992) propose a set of discretization rehnement rules, which 
ensures that their adaptive discretization algorithm generates sequences that con¬ 
verge to a solution of the original SIP problem at the same linear rate with the same 
estimated rate constant as that of the linearly convergent algorithm map used in the 
discretization algorithm. Another similar study that investigates this rate-preserving 
idea is fonnd in Polak et ah (1992) for a semi-inhnite minimax algorithm, which uses 
an extension to Newton’s method as the algorithm map. Polak et al. (1992) state 
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that the rate of convergence for their adaptive discretization algorithm is superlinear. 
Without further information, a user probably will select the superlinearly convergent 
algorithm to solve (SMX). However, for the superlinearly convergent algorithm, be¬ 
cause the level of discretization needs to increase rapidly to achieve the superlinear 
rate, the computational work between iterates increases rapidly. Thus, the compu¬ 
tational time may not be well-correlated to the superlinear rate of convergence since 
the typical rate of convergence does not consider computational work. 

To our knowledge, there has been no rate-of-convergence result that considers 
computational work for discretization algorithms for SIP and (SMX). That said, not 
all rate-of-convergence results are in terms of the number of iterations. Still (2001) 
studies how the rate of convergence for SIP discretization algorithms depends on the 
level of discretization and whether the discretization includes boundary points of Y 
in a specihc way. Shapiro (2009) determines the rate of convergence of an e-optimal 
solution of the discretized problem to the set of optimal solutions of the SIP problem, 
as a function of the level of discretization. 

In our proposed way of expressing rate of convergence, we relate computational 
work to the number of iterations as well as to the level of discretization by making 
some computational work assumptions. This relation allows us to determine the rate 
of decay of a bound on the error between the iterates generated from the discretized 
problems and the optimal solution of (SMX) as a function of computational work, 
which we refer to as rate of decay of error bound in the rest of the chapter. We 
use this new way to develop rate-of-convergence results for various fixed and adaptive 
discretization algorithms for (SMX) and compare them against the rate of convergence 
of an e-subgradient algorithm. We show that the new way allows a fairer comparison 
of the various algorithms than the typical rate of convergence. We also conduct 
numerical studies to validate the theoretical results we obtain. 

The next section describes the discretization approach and determines the 
rate of decay of error bound for discretization algorithms using algorithm maps with 
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varying rate of convergence. Section C determines the rate of decay of error bound 
for an e-subgradient algorithm and compares it against the discretization algorithms. 
Section D contains numerical results. 


B. EFFICIENCY OF DISCRETIZATION ALGORITHM 

We start this section by describing the discretization approach for (SMX) and 
include for completeness some known results that we use in later subsections. 

1. Discretization 

The discretization approach involves approximating T by a hnite subset Yat C 
Y, where IYatI = N (| • | denotes the cardinality operator), and approximately solving 
the resulting hnite minimax problem 

(SMX^) min'^jv(a^), (III.5) 

xeiR'i 

where ^ M, iV G N, is dehned by 

i/^Nix) = max(l){x,y). (III.6) 

y&YN 

(SMXat) can be solved using any hnite minimax algorithms, such as those in Chapter 
II. In the remainder of this chapter, we refer to elements of Yat as grid points. When 
they exist, we denote the optimal solutions of (SMX) and (SMXat) by x* and 
respectively, and the corresponding optimal values by ip* and We next state some 
properties of '^(•) and t/’w(-)- 

Proposition III.l. The following facts hold: 


(i) For all x and iV e N, iPn{x) < ip{x). 

(a) Suppose that is continuous onRf xY. Then%p{-) andpjNp) are continuous 
for any N eN on Mf. 

(Hi) For all N eN, 

rN < r- (ni.7) 
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Proof. The conclusion (i) follows directly from the dehnitions of '^(•) and ipNi'), and 
the fact that C Y. Part (ii) follows, for example, from pp. 51 and 187 of Demyanov 
and Malozemov (1974). For part (hi), by dehnition of x*j^, 'ipN{x*p^) < 'ipNix) for all 
X (which includes x*), thus, based on part (i), 

rN=< Mx*) < ij{x*)=r- (ni.8) 

□ 

In this section, we focus on the following basic hxed discretization algorithm, 
for which we develop a series of rate of decay of error bound results. 

Algorithm III.l. Fixed Discretization Algorithm 
Data: xq G 

Parameters: Discretization parameter A G N and parameters required for the al¬ 
gorithm map. 

Step 1. Generate a sequence by applying an algorithm map to (SMXat). □ 

We need the following assumptions for the rate of decay of error bound anal¬ 
ysis. The operator || ■ || denotes the Euclidean norm. 

Assumption III.2. The functions (^{x, ■),x G are uniformly Lipschitz continuous 
in y, i.e., there exists a constant L < oo such that 

\(j){x,y) -(j){x,y')\ < L\\y-y'\\ (III.9) 

for all a; G and y, y' G M™. □ 

We require an assumption on the discretization scheme, which dictates how 
Iat is generated from Y given a A G N. We assume that the same discretization 
scheme is used throughout this chapter for the various algorithms. 

Assumption III. 3. There exists a Ai G N, a discretization scheme defined for all 
A G N, A > Ai, and a monotonically decreasing function : N —)■ M, where m is 
the dimensionality ofY and Am{N) —)■ 0 as A —)■ oo, such that 

0 < 'ipi^x) - 'ipNix) < Am{N) (III.IO) 
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for all a; G and N E N, N > Ni. In addition, there exists a L' < oo such that the 
discretization error Am{N) can be expressed as 

for all N Efl, N > Ni,m Efl. □ 

Under Assumption III.2, Assumption III.3 holds for example when Y is the 
unit cube [0,1]*”, and Yat, > 2"^, is the uniform grid Iff dehned in each of m 
dimensions by 

= 1 °’ [ivv-j -1’ [ivv-j -1’ •••’ 
and [-J denotes the floor function. 

There are grid points in each of the m dimensions of Y, for a total of 

grid points. Thus, each grid element is a cube with length |^jyi/Lj_x for each 
edge of the grid element. 

To continue the discussion, we need a way to quantify the “distance” between 
two sets. We use Hausdorff distance for this purpose. The Hausdorff distance between 
Y and Y^ is dehned as 

dist(Y, Yat) = max min \\y' — y\\. (III.13) 

yeY y'&N 

The Hausdorff distance between Y and Yat is the maximum distance between any 
point y eY and its nearest grid point in Yat. 

For the unit cube example, the Hausdorff distance between Y and Yat is then 
the distance from the center to a corner of the grid element, which, based on the 
Euclidean distance of two points in m-dimensional space, is 

Let y eY{x)= arg max^gy ^(a;, y), and yi E Y^ be the nearest grid point to 
y. Based on the dehnition of the Hausdorff distance, 

11^1 “ ^11 - 2 ([ Au '^- 1 ) - 
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Under Assumption III.2, there exists a constant L < oo such that 


(j){x,y) - (j){x,yi) < L\\yi 


(III.16) 


A 


Let Yn{x) = argmaXj^gy^ (t>{^:y) and y 2 E Yn{x). Thus, (j){x,y 2 ) > (j){x,yi) and 


0(a;, y) - 0(a;, 1 / 2 ) < 0(a;, y) - 0(a;, yi). 


(III.17) 


Since 'ip{x) = (pix^y) and 'ipN^x) = (f){x,y 2 ) by dehnition, for N > 2"^, 

0 < 'ip(x) — 'iPn(x) < L (III.18) 

There exists a A^i G N such that 


< L 


m 


2{NYm - 2) - AT/' 


(III.19) 


for all N > Ni. This completes the verihcation that Assumption III.3 holds for the 
unit cube with a uniform grid. 

We need the following strong convexity assumption, which is standard for 
rate-of-convergence analysis; see for example Polak et al. (1992). 


Assumption III.4. The function (f){-,y), for all y eY, is twice continuously differ¬ 
entiable, and there exists an a E (0, cxd) such that 


a\\zf < {z,Vl,,(l){x,y)z), 

for all x,z E and y eY . 


(III.20) 

□ 


In the following subsections, we derive the rate of decay of error bounds of fixed 
(Algorithm III.l) and adaptive (Algorithm III.2) discretization algorithms to solve 
(SMX) in terms of computational work. Hence, we need to dehne precisely what we 
mean by an error bound for the various algorithms. For hxed discretization algorithms 
(Algorithm III.l), we denote the iterate of a hxed discretization algorithm based on 
discretization parameter N by x^. Suppose that a computational budget b E (0, cxd) 
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is allocated to solve (SMX), and the computational work required to run Ub E N 
iterations of a fixed discretization algorithm on (SMXtvJ, Nb G N, is no larger than b. 
We refer to the quantity tjj — t/’* as the error. An upper bound on this quantity 
is referred to as an error bound. Obviously, there are many possible error bounds. 
We will dehne several specihc error bounds for analysis. 

For the rate of decay of error bound analyses in this section, we consider a hxed 
discretization algorithm. Algorithm III.l, with an ideal algorithm map that solves 
{SMX]\f^), Nb G N, exactly in one iteration; we consider an adaptive discretization 
algorithm. Algorithm III.2, and we consider Algorithm III.l with algorithm maps 
with quadratic, linear, sublinear rate of convergence, as well as a specihc case of a 
smoothing algorithm. 

We need the following assumption on computational work and budget for the 
rate analysis. 

Assumption III.5. There exist a G (0, cxd) and v G [1,cxd) such that the computa¬ 
tional work required in each iteration of the algorithm map in solving (SMXat) is no 

larger than for all N eN. □ 

The preceding assumption holds with v = 1 for the two smoothing algorithms 
proposed in Chapter II, and holds with u = 3 for the SQP and PPP algorithms 
discussed in Chapter II. Suppose that the assumption holds for the algorithm map 
under consideration and a computational budget of & G N is allocated to Algorithm 
III.l, to run Ub iterations of the algorithm map on (SMXjV;,). Then Nb and must 
be picked such that 

aN^Ub < b. (III.21) 

In the upcoming analyses, we see that the error bounds for the various algo¬ 
rithms often have two components. The hrst component is the error of not achieving 
the optimal solution of the discretized (SMXjvJ, which decreases monotonically as 
the number of iterations nt increase. The second component of the error bound is due 
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to the discretization error Ajn{Nh), and it decreases monotonically as Nh increases. 
From this point onwards, we ignore integrality of Nb and Ub to simplify analysis, since 
it will not affect the subsequent rate analysis as our focus is on asymptotic rate of 
decay of error bounds, when Nb and Ub —)■ oo. Since Nb and Ub are constrained by the 
inequality in (III.21), for any Ni and ni that satisfy (III.21) with strict inequality, 
there must exist a N 2 > Ni and a n 2 > ni that satisfy (III.21) with equality, i.e., 

aN^Ub = b, (111.22) 

which produces a smaller error bound. Thus, we use (III.22) instead of (III.21) for 
subsequent analysis. 

Let and {nbjbgN be sequences that satisfy (III.22) for all b E N. We 

dehne {(Nb,nb)}befi as a candidate selection. Suppose that a particular algorithm 
has error bound Cb, & G N. Obviously, there are many candidate selections that 
make {efejfegN converge to zero. However, some candidate selections result in faster 
rates than others, and we want to hnd these selections. We note that the topic of 
determining algorithm parameter values to optimize algorithm efficiency has been 
addressed in the area of simulation optimization (Pasupathy, 2010; Lee & Glynn, 
2003). 

We hrst consider the rate of decay of error bound Cb for an ideal algorithm 
map, which solves (SMXjV;,) exactly in one iteration for any W G N. 

2. Ideal Algorithm Map 

Suppose that Assumptions III.3 and III.5 hold. Suppose also that a compu¬ 
tational budget 6 G N is allocated to Algorithm III.l with an ideal algorithm map to 
solve (SMXatJ. Since (SMXatJ is solved exactly in one iteration, = 0- 

Based on Proposition IILl(iii) and (III. 10), 

^p(x^'’) - 'll)* < )l)Ni,(xi'’) + A^(Nb) - )l)*N^ = A^(Nb) = 

68 


(111.23) 



where L' is as in Assumption III.3. We define 


ideal _ 

— 


A L'y/m 


N, 


1/m 


(III.24) 


Theorem III.6. Suppose that Assumptions III. 3 and III. 5 hold. Suppose also that 
a computational budget h Eli is allocated to Algorithm III.l with an ideal algorithm 
map to solve (SMXjvJ. Then the error bound 


ideal 



(III.25) 


for all b eN, where L' is as in Assumption III. 3, and a and v are as in Assumption 
III. 5. 


Proof. Since an ideal algorithm map is used, nb = 1 for all b Efi, and from (III.22), 
Nb = {bjaf-!''. The conclusion follows by substituting Nb into (III.24). □ 

The result above states that decays at an asymptotic sublinear rate of 
jj-i/mu ^ QQ Since the ideal algorithm map solves the discretized problems ex¬ 
actly (in one iteration), the rate-of-decay result for determines the rate at which 
the error between the function values at the solutions of the discretized problems and 
the function value at the solution of the semi-infinite problem decays, as the level 
of discretization increases. Similarly, the rate-of-convergence results in Still (2001) 
and Shapiro (2009) determine the rate at which the error between the solutions of 
the discretized problems and the solution of the semi-infinite problem decays, as the 
level of discretization increases. Thus the rate-of-decay result for is related to 
the rate-of-convergence results in Still (2001) and Shapiro (2009). 

3. Adaptive Discretization Algorithm 

The preceding result for fixed discretization can be generalized for a potentially 
more efficient adaptive discretization algorithm as follows. For the following adaptive 
discretization algorithm, we adopt a different notation (from the fixed discretization 
algorithm) for the iterates, specifically, we denote the iterate at the T*® stage by 

Xij. 
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Algorithm III.2. Adaptive Discretization Algorithm 
Data: xq E 

Parameters: Number of stages s G N, discretization parameters A* G N, 

number of iterations in the stages G N, and parameters required for the 

algorithm map. 

Step 1. Set i = 1. 

Step 2. If i > 1, warm-start from the last iterate of the previous stage by setting 
Xi^i = Xi-i^m.i- Else, set Xi^i = Xq. 

Step 3. Generate a sequence by applying a hnite minimax algorithm map 

to (SMXj^J. 

Step 4. If i < s, replace i by i -|- 1, and go to Step 2. Else, end. □ 

Suppose that a computational budget b E N is allocated to Algorithm III.2 
with an algorithm map with an arbitrary rate of convergence to solve (SMX). Suppose 
also that Assumptions III.3 and III.5 hold. 

Based on Proposition III. 1 (hi) and (III. 10), 


^(Xs,nJ + ^m(K) “ (III.26) 

We dehne 

ef*- rN.- (111.27) 


Proposition III. 7. The error bound 

adaptive . L' 

— Jjl/mu 


(III.28) 


for all b eN, where L' is as in Assumption III. 3, and a and v are as in Assumption 
III. 5. 


Proof. The parameters for Algorithm III.2, s G N, {Aj}|^]^, A* G N, and ni E 

N satisfy 

a (Aj^ni + Nfn2 + ... + A>,) = b. (III.29) 
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This implies that aN^ < b, and thus, 


adaptive 


'ipNs{Xs,ns) + Am(iVs) - > Am(iVs) > 



(III.30) 


based on (III.11) and the assumption that Am(-) is a monotonically decreasing func¬ 
tion. □ 

The result above indicates that the for Algorithm III.2 with any al¬ 

gorithm map of any convergence rate, asymptotically decays with a rate no faster 
than b~^l^'^, as 6 —)■ oo. In the following subsections, we show that this optimal rate 
of can be achieved using the fixed-discretization Algorithm III.l with certain 

algorithm maps. 

We say that an algorithm map converges uniformly when applied to (SMXtv), 
A^ G N, if the respective constants c and ni in Section I.D.2 do not depend on N. 


4. Quadratically Convergent Algorithm Map 

We obtain an error bound for Algorithm III.l with a uniform quadratically 
convergent algorithm map in the next lemma. We refer to Section I.D for dehnitions 
of the various rates of convergence and uniform convergence. 


Lemma III.8. Suppose that Assumptions III.3 and III.4 hold. Suppose also that 
Algorithm III.l with a uniform quadratically convergent algorithm map is used to 
solve (SMX), i.e., there exist ni G No,ni < oo, and ci G (0,cx)) such that 


< Cl, 


for all n > Ui. Then there exist c, k < oo such that for all n > ni and N E N 


(III.31) 


fjix^) -fj* < -h AmiN). 


(III.32) 


Proof. Based on Proposition Ill.l(iii), (III.10), and (III.31), 

< cr"^-M^iv«) - + A™(iV). (III.33) 
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From (III.10), —< —'ip{x*j^) + Am{N). Based on Assumption III.3, Am{N) is a 
monotonically decreasing function, thus Am{N) < Am{Ni) for all G N, > A^i. 
Since 'ipix*^) > -ipi^x*) by definition. 


■“i-lr 


* l2” 


Cf ^ bpN{Xn,) 


< [ci {ip{x^J + A^{Ni)) 


2" [ci {jjjx^J -jj* + A^(A^i))] ^ 

Cl 


-(111.34) 


where A^i is as in Assumption III.3. Above we use the fact that for uniform conver¬ 
gence, rii is independent of N. 

Under Assumption III.4, x^^ is bounded for any ni G N and A^ G N, A^ > A^i. 
Since is continuous, 'ipix^J is bounded for any ni G N and A^ G N, A^ > A^i. Based 
on Assumption III. 4, is hnite. As Ni G N, Am(A^i) < cxd based on Assumption 
III.3 and (III.11). Finally, ci and ni are independent of N based on the assumption 
of uniform convergence, thus c and k < oo. □ 

From (III.32), we dehne the error bound for Algorithm III.l with a quadrati- 
cally convergent algorithm map as 


quad A 2"i) 
Cb — C 


^ + Am{Ni,). 


(III.35) 


The next result states that if we choose the candidate selections in a certain 
way, then a hxed discretization algorithm with a quadratically convergent algorithm 
map can achieve the same optimal asymptotic rate of decay of error bound as Algo¬ 
rithm III.2. 

We use log(-) to denote the natural logarithm. 

Theorem III. 9. Suppose that Assumptions III.3, 111-4, ///.5 hold. Suppose also 

that a computational budget 6 G N is allocated to Algorithm III.l with a uniform 
guadratically convergent algorithm map with rate of convergence given by (III. 31) to 
solve (SMX). If 

6 log 2 

(j [log log (6/(7) — log(—mz/logc) 



l/u 


(III.36) 
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and 


Ub = 


loglog(6/(T) — log(—mz/logc) 


log 2 


(III.37) 


for all b eN, then (III.22) is satisfied and 

log e 


lim 


quad 

b 


(III.38) 


b^oo log b mu 

where m eH is the uncertainty dimension and v is as defined in Assumption III. 5. 

Proof. From (III.11), (III.22), and (III.32), 


log e' 


quad _ 


log ^exp [log K + log c] + exp log L'^/m — log{N^^^) 

log (^exp [log K + iQg^j 

6 log 2 \ 


exp log L'^/m -log < ^^^-- 

mu \(T[loglog(o/(T) — log(—mz/logc)J 

= log ( exp [log K + (log c) exp[log log(6/ a) — \og{—mu log c)]] 


exp 


\ogL' y/m -log 


6 log 2 


= log exp 


mu \(T[loglog(6/(j) — log(—mz/logc)] 
log K + (log c) ^ ^ 


mu log c 

L'i/m(T[loglog(6/(j) — log(—mz/logc)] 


= log K - 

. (T 


+ 


6 log 2 / 

L'i/m(T[loglog(6/(j) — log(—mz/logc)] 


1 

miy 


= log 


&log2 

L'i/m(T[loglog(6/(T) — log(—mz/logc)] 


6 log 2 


K 


X 


(9 


1 

mv 


Vy/m(7\[og \og{h/(7)—\og(—rav log c) 
6 log 2 



(III.39) 
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where we use the fact that = exp(a;) for a; G M. Simplifying the second term 

within the log(-) function, we obtain that 


K, 



1 

mi/ 


L'y/m(T[log \og{b/cr)—log(—mi' log c) 
6 log 2 


1 

mu 


K 



1 

mu 


L'y/ma[log \og{b/cr)—log(—mi' log c)] 
^g2 


1 

mu 


(III.40) 


Since 

_^ 

lim -- — = 0, 

f L'y/rncr[\oglog{b/a)—\og{—mulogc)]\”^'' 

V log2 ) 

then by continuity of the log(-) function, 


lim log 

b^oo 



\ I L'^/mcr[\og log{b/a)—\og{—mi' log c)] 


\ 

+1 

nu 


0 . 


(III.41) 


(III.42) 


Therefore, continuing from (111.39), 


lim 

b^oo 


1 quad 

log Cfe 

log 6 


lim - 

b^oo mu 

log 6 
log 6 


log L'y/ma 
log b 

log log 2 \ _ 
log 6 / 


+ 


log[loglog(&/(j) — log(—mz/logc) 
log b 


1 

mu 


(III.43) 


This completes the proof. □ 

Roughly, what Theorem III.9 says is, if you make certain choices for the dis¬ 
cretization, by picking Nf, and rib as in (III.36) and (III.37), respectively, for large b, 
if b increases by a factor bi G (1, cxd), then ^g(;.j,gg^ggg py factor b^ . 


5. Linearly Convergent Algorithm Map 

We next obtain an error bound for Algorithm III.l with a uniform linearly 
convergent algorithm map. 


Lemma III.10. Suppose that Assumptions III.3 and III.4 hold. Suppose also that 
Algorithm III.l with a uniform linearly convergent algorithm map is used to solve 
(SMX), i.e., there exist ni G No, ni < oo, Ni G N, and c G (0,1) such that 


^jv(<+i) - i^*N 


< C, 


(III.44) 
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for all n > Til and N > Ni. Then there exists a k < oo such that for all n > ni and 
N > Ni, 

+ A™(iV). (III.45) 

Proof. Based on Proposition Ill.l(iii), (III. 10), (III.44), and using similar arguments 
as the proof for Lemma III.8, 

< c«[^^«)-^^]+A^(iV) 

< c"(c-"M^«) -r + A^(Ai)]) + A^(iV), (III.46) 


where Ni is as in Assumption III.3. The remaining part of the proof follows the same 
arguments as the proof for Lemma III.8. □ 

From (111.45), we dehne the error bound for Algorithm III.l with a linearly 
convergent algorithm map as 

e|,“-'^c"^«; + A^(iV,). (111.47) 


The next result states that a hxed discretization algorithm with a linearly 
convergent algorithm map can achieve the same asymptotic rate of decay of error 
bound as Algorithm III.2. 


Theorem III.11. Suppose that Assumptions III.3, Ill.f, and III.5 hold. Suppose 


also that a computational budget 6 G N is allocated to Algorithm III.l with a uniform 


linearly convergent algorithm map with rate of convergence given by (III.44) to solve 


(SMX). If 


Nh = 


mbv log c 
a log b 


llv 


and 

blogb 
mbu log c 

for all b eN, then (III.22) is satisfied and 


lim 

b^oo 


log& 


1 

mz/’ 


(111.48) 

(111.49) 


(111.50) 
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where m G N is the uneertainty dimension and v is as defined in Assumption III. 5. 


Proof. From (III.11), (III.22), and (III.45), 

61ogc 


loge 


linear 


= log ^exp 
= log (exp 


am 


+ logK 


-ab log 6 log c 
ambn log c 


+ exp 


+ logK 


log L'^/m - log(iV^^'^™) 


+ exp 


logL'i/m — log 


-mbv log c 
alogb 


1/mv^ 


= log exp 


log L'y/m — log 


exp + log K 


( 

—mbn log c 

l/mu\ 

1 

alogb 

). 


X 


exp 


log L'y/m — log 


( 

—mbu log c 

l/mu\ 

1 

cr log b 

). 



Simplifying the second term within the outermost log(-) function, 
exp [^ + logK\ 


K 


—mbu log c 
(7 log b 


\lmv 


exp 


logL'A/m — log 


( 

—mbu log c 

l/mu\ 

b^/'^^L'y/m 

( 

cr log b 

). 



K 


—mu log c 
(7 log b 


1 Ijmu 


L'y/m 


Since 


K 


lim 

b^oo 


—mu log c 
(7 log b 


Ifmu 


L'^/m 

then by continuity of the log(-) function. 


= 0 , 


lim log 

b^oo 


exp [-^ + log K 


exp 


log L'^/rn — log 


( 

—mbu log c 

l/mu\ 

1 

a log b 

). 


T + 1 > = 0. 


(III.51) 


(III.52) 


(III.53) 


(III.54) 
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Therefore, from (III.51), 


lim 

b^oo 


log 
log 6 


lim 

b^oo 

lim 

b^oo 


log L' 


m 


log 


—mbu log c 
cr log b 


llmu 


log 6 


log 6 


1 f \og{—mv log c) + log b — log a — log log b 


mv 


log b 


mu 


(III.55) 


This completes the proof. 


□ 


6. Sublinearly Convergent Algorithm Map 

Since both the quadratically and linearly convergent algorithm maps obtain 
the ideal rate of decay of error bound of one may think that the rate of the 

algorithm map does not matter. But that is not the case, as the following counter 
example shows. We next obtain an error bound for Algorithm III.l with a uniform 
sublinearly convergent algorithm map. We dehne the initial error cq = ~ t/’*- 


Lemma III. 12. Suppose that Assumption III.3 holds. Suppose also that Algorithm 
III.l with a uniform sublinearly convergent algorithm map is used to solve (SMX), 
i.e., there exist Ni eN and a> 1 such that 




N ^ 
n+1 J 




N 


< 1 - 


~ n + a 

for all n eN, N E N, and N > Ni. Then for any n E N, N E N, N > Ni, 

a — 1 


(III.56) 


^ i^n) - < 


n — 1 + a 


Co + 2Am{N). 


(III.57) 
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Proof; Based on Proposition III.l, (III.10), and (III.56), 


^ i^n) - V 

< -IpN {Xn) + ^m{N) - 


< ( 1 -- 
a 

a — 1 


1 -|- fl 


n — 1 + a 


:4’n{xo) - + Am(iV) 


< 


a 

a — 1 


1. (X 


n — 2 + a 
n — 1 + a 


)Pn{xo) - 'iplr] + Am(iV) 


< 


n — 1 + a 
a — 1 

n — 1 + a 


[^(a;o)-^* + A^(iV)] + A,„(iV) 
Co + 2Am{N). 


(III.58) 


This completes the proof. □ 

From (III.57), we dehne the error bound for Algorithm III.l with a sublinearly 
convergent algorithm map as 


b A « 1 + 2AUNb). (III.59) 

Tib — 1 + a 

The next result states that a hxed discretization algorithm with a sublinearly 
convergent algorithm map is unable to achieve the same asymptotic rate of decay of 
error bound as Algorithm III.2. 


Theorem III. 13. Suppose that Assumptions III.3, 111-4, III. 5 hold. Suppose 
also that a computational budget h Eli is allocated to Algorithm III.l with a uniform 
sublinearly convergent algorithm map with rate of convergence given by (III. 56) to 
solve (SMX). Then for all possible sequences of {{Nb,nb)}b£K, 

lop- 1 

liminf ——^ >-, (III.60) 

b^oo log b mn 

where m eH is the uncertainty dimension and v is as defined in Assumption III. 5. 
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Proof. From (III.11), (III.22), and (III.57), 


log e™'’ = log exp [log(a - 1) + log Cq - log(nfe - 1 + a); 


, [i or' r- , loganb 

+ exp log 2L \/m -1- 

mv mu 


> log ( max < exp [log(a — 1) + log cq — log(nb — 1 + a)], 


or' ^ , logarrfe 

exp log 2L ^/m -1- 

mu mu 


= max < log (exp [log(a - 1) + log Cq - log(nb - 1 + a)]), 


1 ( h or' ^ ^ogb log aub 

log exp log 2L y/m -1- 

\ L mu mu 

= max / log(a — 1) + logeo — log(rrfe — 1 + a). 


, or' r- , logcTOfe 

log 2L y/m -1- 

mu mu 


(III.61) 


Hence, for any b > 2, 


> max 


log(a - 1) log ep _ log(nfe - 1 + a) 

log b log b log b 

log2L'y/m log 6 ^ log crrrfe 1 

log 6 mulogb mulogb j 


(III.62) 


For the sake of contradiction, we assume that there exists a sequence {Nb, nb}b >2 


where 


b^oo log b mu 


(III.63) 


This implies that for every e > 0, there exists an inhnite subsequence H C N, a 


G H, 6i > 2 such that 


loger" ^ 1 I ^ 

log b ~ mu ’ 


(III.64) 


for all & G i3,6 > 6i. From (III.62) and (III.64), 


log(a - 1) log ep _ log(rrj, - 1 + a) ^_L + g 

log 6 log 6 log 6 “ mu ’ 


(III.65) 
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and 


log 2 L'i/m 1 ^ Xogarih ^ 1 

log& mv mv\ogh ~ mu 

for aA\ b E B,b > bi. From (III.65) and (III. 66 ), there exists a, b 2 E B,b 2 > bi such 

that 


+ e, (III. 66 ) 


and 


log^fc 

log 6 


1 

>- 2 e, 

mu 


(III.67) 


< 2mz/e, (III. 68 ) 

logo 

for all b E B,b > 62 - Since e is arbitrary, (III.67) contradicts (III. 68 ) for sufficiently 
small e, and the conclusion follows. □ 


7. Smoothing Algorithm Map 

In this subsection, we analyze the rate of decay of error bound for Algorithm 
III.l using smoothing algorithms as algorithm maps. We hrst repeat some of the 
known results on the exponential smoothing technique from Section II.B, based on 
the assumptions and notation in this chapter. 

For any p > 0 and iV G N, we consider a smooth approximating problem to 
the generally non-differentiable (SMXat), 


(SMX^J min^jvp(x), (III.69) 

where 

'ipNpix) = -log ( ^ exp(p0(a;,|/)) J (III.70) 

= iJN{x) + - log I ^ exp (p(0(a;, y) - ^n(x))) j (III.71) 
^ \yeYjv / 

is the exponential penalty function. 

The parameter p > 0 is the smoothing precision parameter, where a larger p 
implies higher precision. With the obvious notational changes, we have a similar result 
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on the bounds for 'ipNp{x) — ipiq{x) and the differentiability of ipNp{-) as Proposition 

II.l. 

Assumption III.14. The function pp, •) is twice continuously differentiable on x 

Y. □ 

Lemma III.15. Suppose that Assumption III. If holds. Then for every bounded set 
S there exists an L < oo such that 

{z,V^iljNp{x)z) <pL\\z\\f (III.72) 

for all X e S, z e iV G N, and p >1. 

Proof. The proof follows similar arguments as the proof for Lemma II.7 on p. 20. □ 

When they exist, we denote the optimal value of (SMXjVp) by ippfp for any 
N E N and p > 0, and the optimal solution of (SMXatp) by We denote the 
iterate of a sequence generated by an algorithm map when applied to (SMXatp) by 

rpNp 

•^n ■ 

Lemma III.16. Suppose that Assumption Ill.f holds. For any x,z E M.^,N E N, 
and p > 0, 

< {z,'\l‘^'pNpix)z) , (III.73) 

where a satisfies the inequality in Assumption Ill.f. 

Proof. The proof follows the same arguments as the proof for Lemma 11.5 on p. 19. 

□ 

The Armijo Gradient Method is referenced in the following proposition. The 
Armijo Gradient Method uses the steepest descent search direction and the Armijo 
stepsize rule to solve an unconstrained problem; see for example Algorithm 1.3.3 of 
Polak (1997). 

Proposition III.17. Suppose that Assumption Ill.f holds, N eN, and p > 1. Then 
the rate of convergence for the Armijo Gradient Method to solve (SMXatp) is linear 


81 



with coefficient 1 — k/p, for some k G (0,1). That is, for any sequence 

generated by the Armijo Gradient Method when applied to (SMXatp), there exists a 

k G (0,1) such that 

f^Npix^h) - fj*Np < (^1 - bl^Npix^^) - fj*Np)] for all n G Nq. (III.74) 
Proof. The proof follows the same arguments as the proof for Proposition II.8 on 

p. 22. □ 

Lemma III.18. Suppose that Assumptions III.3 and III.4 hold. If the Armijo Gra¬ 
dient method is applied on (SMXatp), where p > 1, iV G N, iV > Ni, and Ni is as 
defined in Assumption III. 3. Then for any n eN, 

-r<(l--] eo + 2A^(iV) + (III.75) 

\ pj p 

where k G (0,1) is the constant in Proposition III. 17. 

Proof. Since 4>{-,y) is twice continuously differentiable for all y eY, with an equiv¬ 
alent result for 'ipNpi') as Proposition II. 1, 'ipNpi') is continuous and 

0 < ipNp{x) - iPn{x) < (III.76) 

p 

for all TV G N, p > 0, and x E Based on Proposition III.17, 

ij{x^n-r 

< {x^^) + A ffiN) -fjN (x *) 

< fjjVp(Xn^) + Am(N) - IpNpix*) + (logiV)/p 

< f)Np{x^^) - f^*Np + Am{N) + (log TV)/p 

< (1 - {k/p))"" [ifNpixo) - f^*Np] + AffiN) + (log N)/p 

< (1 - (/c/p))” [iPn{xo) + (log N)/p - 'IpNix^p)] + Am(N) + (log N)/p 

< (1 - (/c/p))” [ip(xo) + (log N)/p - ^^(x^p) + A™(iV)] + A^(A) + (log A)/p 

< (1 - (k/p)r [ij(xo) - ij(x*)] + 2AffiN) + (2 log N)/p. (III.77) 
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This completes the proof. □ 

From (III.75), we dehne the error bound for Algorithm III.l with a smoothing 
algorithm map as 

^smooth A A _ jfc y ^ 2A^(iV,) + (III.78) 

V PbJ Pb 

The next result states that a hxed discretization algorithm with a smoothing 
algorithm map is unable to achieve the same asymptotic rate of decay of error bound 
as Algorithm III.2. 


Theorem III.19. Suppose that Assumptions III.3, 111 - 4 , III.5 hold. Suppose 
also that a computational budget 6 G N is expended by running Ub E N iterations 
of the Armijo Gradient method on (SMXat^pJ, with discretization parameter Nb G 
^,Nb > A^i ds defined in Assumption III. 3, and smoothing parameter pb > 1. Then 
for all possible seguences of {Nb,nb,Pb}bm satisfying (III.22), 


lim inf 

b^oo 


log e[ 


smooth 


log b 


> - 


mn 


(III.79) 


where m G N is the uncertainty dimension and v is as defined in Assumption III. 5. 
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Proof. From (III. 11), and (111.75), 

b 


log e' 


smooth 


= log exp 


log 1-+logeo 

Pb 


+ exp 


log 


2L\/m 


N, 


Ijm 


+ exp 


log 


2 log Nb 
Pb 


> log ( max exp 
2 log Nb 


am 


log 1-+ log eo 

Pb 


,exp 


log 


2L\/m 


N, 


Ijm 


exp 


log 


Pb 

max log f exp 




log 1-+ log eo 


Pb 


log exp 


log 


2L\/m 


N 


Xjm 


, log exp 


log 


2 log Nb 
Pb 


= max 


am 


k 


Pb 


log 1-+logeo,log 


2L'y/m , 21ogiVfe| 


N, 


Ijm 


,log 


Pb 


.(III.80) 


Hence, for any 6 G N, 6 > 2, 


log e 


smooth 


log b 


> max 


aN^ log b 


lose. log^ 

Pb J log 6 ’ log 6 ’ log & 


KIII.81) 


For the sake of contradiction, we assume that there exists a sequence 
{Nb,nb,Pb}beN where 

Ino- psmooth i 

liniinf^^S^h_< ^ 


b^oo log b mu 

This implies that there exists an inhnite subsequence H C N such that 


(III.82) 


log 1 

hm - — - <- 

fe^-®oo log b 


mu 


(III.83) 


which further implies that for any e G (0, there exists &bi E B such that 


log e 


smooth 


1 h — - 7 ^’ 

log b mu 

for all b E B,b > bi. From (III.81) and (III.84), we have 

b , f k\ 1 . 

log 1-<-+ k. 


am log b 


Pb 


mu 


(III.84) 


(III.85) 
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(III. 86 ) 


\og2L'^/m _ logiVfe ^ — + le 

log 6 mlog6 “ mu ^ ’ 


log 2 loglogiVfe logpfe 1 , 

-1-- -h ^6 

log 6 log 6 log 6 “ mu ^ ’ 


(III.87) 


for all b e B,b > max{ 2 , bi}. 

There exists ab 2 & B,b 2 > bi such that (log2L'A/m)/log6 > —for all b > 62 - 
From (III.85)-(III.87), we get that 

7, log ( 1 --) <-— + €, (III.88) 

ai\lj log b \ pb J mu 


log Nb ^ 1 , 

-i-7 —-^ 

m log b mu 


(III.89) 


log log Nb _ log Pb ^_ — + e 

log 6 log 6 “ mu 

for all & G i3, 6 > max{2, 62}- 

From (III.89), for all 6 G i3, 6 > max{2, 62 }, 


(IIL90) 


Nb > b-'' 


(III.91) 


From (111.90), for all 6 G i3, 6 > max{2, 62}, Pb > 1 , 


<-h e. 

mu 


This implies that 


logiVb 


(III.92) 


(III.93) 


Next, we substitute Nb > bi' from (III.91) into (III.93), and we get 

logiVfe log 6 ^-™" (I ,, /TTTn^^ 

Pb > = - - me 6 ™- log 6 , (III.94) 

0 0 mv~^^ \U J 
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(III.95) 


for all 6 G i3,6 > max{ 2 , 62 }- Using from (111.91), 

b ^ b _ b 

aN^ log 6 - ^ log 5 a log b ’ 

for all 6 G i3, 6 > max{2, & 2 }- Observe from (III.94) that if e < then — e > 0 
and pb ^ oo as b ^ oo. Thus, there exists a b^ E B,b 3 > 62 such that k/pb G [0,1/2] 
for all b > bs, and based on Lemma 11.13, 


11 2k 

log 1 ->- 

PbJ Pb 


(III.96) 


for all 6 G i 3 ,6 > 63. 

Based on (III.94)-(III.96), 

k 


aNk log b 


log 1 -> 

Pb 


-2k 


a log 6 (1 _ me) b^-^ log b 

-2k 


a \- — me 

\ V 




—mve—e 


(III.97) 


As e < — -— e > 0 and 


log b 

0 as & -)■ cx). 


7 —mve — e 1 


T I * II V, y— ^ Kj T”^ 7 T 

l+mu^ mu a(^-me)b^- 

Thus, there exists a 64 G i3 such that (III.97) contradicts (III. 88 ) for all b > 64. This 
completes the proof. □ 


C. EFFICIENCY OF e-SUBGRADIENT METHOD 

Section III.B shows that discretization algorithms for solving (SMX) can ob¬ 
tain at best an asymptotic rate of decay of error bound of ag 5 qq, where 

m is the uncertainty dimension, z/ is a parameter related to the work per iteration of 
the algorithm map, and b is the computational budget expended. Hence, discretiza¬ 
tion methods may perform poorly for (SMX) with large uncertainty dimension. In 
this section, we show that an e-subgradient algorithm for (SMX), which relies on ad¬ 
ditional assumptions as compared to discretization algorithms, have more favorable 
rate of decay of error bound for moderate and large m. 

This section starts with some dehnitions followed by a description of the e- 
subgradient algorithm. We then we determine the rate of decay of an error bound 



based on the e-subgradient algorithm. Most of the background information on the 
e-subgradient algorithm in this section are extracted from Bertsekas (2010). 

We start by dehning subgradients and sub differentials. 

Definition III.l. Let / : —)■ M be a convex function. A vector g is 

(i) a subgradient of /(•) at a point a; G if 

f{z) > f{x) -(z- x)'^g (III.98) 

for all z EMf’', 

(ii) an e-subgradient of /(■) (e > 0) at a point a; G if 

f{z) > f{x) -{z- xfg - e (III.99) 

for all z EMt^. 

(hi) The set of all subgradients of a convex function /(•) at a; G is called the 
subdifferential of /(■) at a; G which is denoted by df{x). □ 

We consider the following e-subgradient algorithm. 

Algorithm III. 3. e-Subgradient Algorithm 
Data: xq E 

Parameters: a > 0, e > 0. 

Step 1. Set i = 0. 

Step 2. Compute yi eY such that 

0(a:i, Vi) > fjixi) - e. (III.100) 

Step 3. Determine the next iterate 

Xi+i = Xi- aV^(t){xi, Vi). (III.101) 

Step 4. Replace i by i -|- 1, and go to Step 2. □ 

The key step of Algorithm III.3 is Step 2, where we hud a. yi E Y that has a 
value within e of 'f{xi). Under the assumption that for all y eY, 0(-, y) is convex for 
all X E the search direction V x4>{,Xi,yi) is an e-subgradient of fjf) at Xi. 
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In Algorithm III.3, we use constant stepsize a. There are two other step- 
size rules for subgradient algorithms, diminishing stepsize and dynamically-chosen 
stepsize. We refer to Bertsekas (2010, pp. 272-274) for a detailed discussion of the 
advantages and disadvantages of the three schemes. In the theoretical and numerical 
results that follow, we see that even though the simplest scheme of constant step- 
size is considered, the rate of decay of error bound for the e-subgradient algorithm 
is fundamentally better than the discretization approach as its rate of decay of er¬ 
ror bound does not depend on the uncertainty dimension, unlike the discretization 
case. However, as stated next, we need an additional concavity assumption for the 
e-subgradient algorithm. 

Assumption III.20. The functions are convex for all y eY, and (^{x, •) are 

concave for all x EMf-. □ 

The above assumption is necessary as subgradient algorithms only handle con¬ 
vex problems. In addition, the concavity assumption is required to ensure that the 
global maximization step in Step 2 of Algorithm III.3 can be completed in hnite time. 
We compare that to the assumptions for discretization algorithms, where strong con¬ 
vexity on 4>{-,y) for all y E Y is required, but no concavity assumption is necessary. 
We refer to problems that satisfy Assumption III.20 as convex-concave problems. 

We also need the following assumption on the boundedness of the subgradients. 

Assumption III.21. For any bounded set S C there exists an s < oo such that 

sup {WglWg ^ dfj{xi)} < s. (III.102) 

i&no,xi&s 

□ 

We obtain the following convergence result for Algorithm III.3 from Bertsekas 
(2010, p. 349). 
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Proposition III.22. Suppose that for all y eY, is convex on //{xjjjgNo 

is a sequence generated by Algorithm III. 3, then for all i G No and x G 


X - Xi+i\\‘^ < ||a; - XiW'^ - 2q; [ijj{xi) - '4){x) - e] + Q;^||5'*|l^, (III.103) 


where a and e > 0 are as in Algorithm III.3, and pi G is an e-subgradient of fj^-) 
at Xi. □ 

We denote the optimal solution set of (SMX) by X* = [x E = '?/’*} 

and the distance of the initial point xq to X* by d{xo) = irnux^x* ||a:o — a;||. We 
follow the arguments in the convergence analyses of Bertsekas (2010, Section 6.3) 
on subgradient algorithm to derive the following two convergence results for the e- 
subgradient algorithm, Algorithm III.3. 


Proposition III.23. Suppose that Assumption III.21 holds, and that (f>{-,y) are con¬ 
vex for all y eY . If the sequence is generated by Algorithm III.3 in solving 

(SMX), then 


lim inf 'ip{xi) < 'ip* -\ -h e, 

i^oo 2 


(III.104) 


where a and e > 0 are as in Algorithm III.3, and s is as in Assumption III.21. 


Proof. The proof follows the same arguments as that for the subgradient algorithm 
in Bertsekas (2010, p. 275), with the difference that (III.103) is used here for the 
e-subgradient algorithm instead of the corresponding equation for the subgradient 
algorithm. □ 

The next result gives an estimate of the number of iterations required by 
Algorithm III.3 to attain an error tolerance of (q;s^/2) + e + e'/2, for any e' > 0. 

Theorem III.24. Suppose that the functions (j){-,y) are convex for all y E Y. Suppose 
also that Assumption III.21 holds, and the sequence {xjjjgisjQ is generated by Algorithm 
III. 3 in solving (SMX). If X* is nonempty, then for any e' > 0, 

-I- 2e -I- e' 

min f;{xi) - fj* < , (III.105) 

0<i<K 2 



where 


K = 


d{xof 


ae 


(III.106) 


a and e > 0 are as in Algorithm III.3, and s is as in Assumption III.21. 


Proof. We follow the proof for the subgradient algorithm in Bertsekas (2010, Propo¬ 
sition 6.3.3), with (III.103) replacing the inequality 6.3.1(a) of Bertsekas (2010). For 
the sake of contradiction, we assume that (III. 105) does not hold. Thus, for all i such 
that 0 < i < K, 


ii{xi) -i)* -e> 


a.s‘^ 4- e' 


(III.107) 


Using this relation in (III. 103), with x G X*, we obtain for all i such that 0 < i < iF, 


min \\xi+i — x*\\'^ 

< 

min 

\\Xi - 

- X 

x*&X* ' ' 


x*&X* 




< 

min 

\\Xi - 

- X 



x*&X* 




< 

min 

\\xi - 

- X' 



x*&X* 




* II2 


2 o + , 2 2 

— 2q;--- \- a s 


, 2„2 


Applying (III. 108) recursively, we obtain 


(III.108) 


0< min \\xi+i—x*\\‘^< min \\xq — x*\\‘^ — {K + l)ae'. 

x*&X* x*&X* 


Solving for K gives 


K < 


mm xn — X 
x*&X* 

ae' 


* II2 


^ ^ d{x^f _ ^ 


ae' 


(III.109) 


(III.llO) 


which contradicts (III. 106). □ 

The following assumption regarding the computational work required for func¬ 
tion and gradient evaluations provide the basis for analyzing the computational work 
required for Algorithm III.3 to solve (SMX). 

Assumption III.25. There exist constants 'y,a',a” < oo such that for any x G 
y E Y C. M™, the computational work to evaluate any of the three functions (j){x,y), 
Vx4>{x,y), orVy4>{x,y) is no larger than □ 
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The following assumption ensures that Algorithm III.3 generates bounded se¬ 
quences. 

Assumption III.26. The level set 

L{xo) = {x\ij{x) < (III.Ill) 

is bounded, for all xq G where xq is as in Algorithm III. 3. □ 


We refer to the iterations of Algorithm III.3 as major iterations and the it¬ 
erations of the algorithm map in Step 2 of Algorithm III.3 as minor iterations. We 
denote the minor iterate during the major iteration as yi^. 


Assumption III.27. A linearly convergent algorithm map is used in Step 2 of Algo¬ 
rithm III.3, i.e., there exist a c G (0,1) such that 


xf) (f)(^Xi, yijj^i 

tlj{xi) - (j){xi,yij) 

for all j > 1. In addition, the computational work in the linearly converging algorithm 
is no larger than a constant number of function and gradient evaluations at each 
iteration. □ 


< c 


(III.112) 


We dehne 

sup |0(a;,|/i) - 0(a;, 1 / 2 ) 1 . (III.113) 

,yi,y2€iY 

The next result provides an upper bound on the computational work of Algorithm 
III.3. 


Theorem III.28. Suppose that Assumptions III.2, III.I 4 , III.20, III.25, III.26, 
III.27 hold, and X* is nonempty. Then for any xo G e',Q; > 0, there exist 
constants a, a', a", c', c" < 00 , such that the computational work in Algorithm III. 3 to 
generate {xijfSQ, to solve (SMX), while satisfying (III.105) and (III.106) is no larger 
than 


a 

ae' 



log(e/e, 


max \ 
0 ) 


logc 



if e < e 


max 
0 5 


(III.114) 
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and 


am^ d'^ 


ae 


otherwise. 


(III.115) 


Proof. The main computational work in Algorithm III.3 is in Step 2, to determine 
an e-maximizer Ui. Since Y is bounded, and for all x G 0(a;, •) is globally Lipschitz 
continuous according to Assumption III.2, < cxd. Based on Assumption III.27, 

the number of iterations required to obtain yi such that 'ip{xi) — (f){xi,yi) < e' is no 
larger than (log(e/e™^"")/ logc) -|- 1 if e < and equals zero if e > Since the 
computational work in the linearly convergent algorithm is no larger than a constant 
number of function and gradient evaluations at each iteration, based on Assumption 
III.25, there exist constants 7 , a', a" < 00 such that the computational work at each 
iteration is no larger than 

The main computational work in Step 3 of Algorithm III.3 is the computation 
of the gradient Vx(t>{xi,yi), and based on Assumption III.25, there exist constants 
C,c',c" < 00 such that the computational work for Step 3 is no larger than (m'^'d'^”. 

Based on Assumption III.26 and (III.106), there exists a /3 < 00 such that 

d{xoY 


K = 


ae' 


^ d{xoY ^ 


(III.116) 


Thus the overall computational work for Algorithm III.3 to generate {xi}^^, 
satisfying (III. 105) and (III. 106) is no larger than 


ae' 




where a = max{/3^,/^y}, and 


logc 


c' jc" 

am d 

-;— otherwise, 

ae' 


if e < 


(III.117) 


where a = 

From (III. 105), we define the error bound for Algorithm III.3 as 


(III.118) 

□ 


subgrad ^ a^S -|- 2efe -|- 6^ 


(III.119) 
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with ab, eb, e'^ > 0 satisfying the following inequalities for b eN, 


\ + 1^1 < & if g, < g”"", 


(III.120) 


- - — < b otherwise. 

OLh^b 


(III.121) 


We call a sequence {e?,, e'j,, 6 G N a feasible sequence if it satishes 

(III.120) and (III.121). 

The next result states that Algorithm III.3 achieves an asymptotic rate of 
decay of error bound of 6“^/^ as 6 —)■ cxd. 


Theorem III.29. Suppose that Assumptions III.2, III.I 4 , 111.20, III.21, III.25, 
III. 26, and III. 27 hold. If Algorithm III. 3 is used to solve (SMX), then for all arbitrary 
feasible sequences of {eb,e'ij,ab}beN, 


1 SUDS 

b^oo log b 


subgrad 


> —. 
- 2 


For all b eN, if Cb = = mf Vb and if 


(III.122) 


ab = ^ m^'d^" + whenever (III.123) 


C 

am a 

ab = — :r -,— otherwise, 

K 


(III.124) 


l^ggsubgrad 

hm —^—-— 
b^oo log b 


(III.125) 


Proof. For any arbitrary feasible sequence of {ej,, e'^,, afejfegH, we hrst consider the set 


.4, C N, where Cb > for all b E A. Based on (III. 119), > g^^x 

b E A. Since > 0 by dehnition, there exists a bQ eN such that 

log^subgrad ^ 

log 6 ~ 2 


> elP®"" for all 


(III.126) 


for all 6 G 4., 6 > bo. 
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Next, we consider those b E N — A (where ‘ —’ represents the set difference 
operator), where eb < Based on (III. 120), we obtain that 




We substitute ab into (III. 119), and we obtain that 


(III.127) 


subgrad 


2 K [ V logc 

am‘^'(f" s'^ am^^'d!^" /log(efe/e“®'^) 


> — m^(r +rrfd^^ 
2 be', 


+ 1 ) s +efe + — 


■ ^ ° V l +eb+i. (III.128) 
log c / 2 


Taking log on both sides, we obtain that 


subgrad 


log ^ exp log arrf'd^”s'^ — log 266^ + exp log arrf'd^'” — log 2 be[ 

+ log + l) ] + exp [log e.] + exp [log |] ) 


> log ^ max I exp log am'^ db^ — log 26e[, , 


exp log anf'd‘^"— log 26e[, + log 


log(efe/e(j 


+1 ) 


exp [log eb], exp log ^ 


max I log ^exp log am^ d^ — log 2 beb j , 


log ( exp log arrf' d°' — log 2 beb + log 


\og{eb/el 


log (exp [log eb ]), log (exp log - 


= max < log am'^ d^ — log26eb. 


log arrf d" — log 2 beb + log 


logeb,log^ b 


\og{eb/el 


+1 ) 


(III.129) 
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Hence, for any b E N — A,b > 1, we obtain that 

(\og am^'d^” s'^ loj 

1 


> max 


log am^ log 2 beh 

log b log b ’ 

/ // o U,r ( I 1 A 

log am“ d°‘ log26ef, ^ logc + 

log b log b log b 

logeb logy ] . 

log 6 ’ log 6 I ' 


(III.130) 


For the sake of contradiction, we assume that there exists a feasible sequence 
{cfe, e'b, ab}beN-A such that 


1 suoe 

b^oo log b 


subgrad 

b 


(III.131) 


This implies that there exists a 5 > 0, a 6 i > 1, and an inhnite subsequence B C N —^ 
such that 


log^subgrad 

log 6 ^2 

for all 6 > 6 i, 6 G H. 

From (III. 130) and (III. 132), we obtain that 


(III.132) 


log log 2 k;, logc ^ 1 j fill 1341 

log b log 6 log 6 2 ’ 

^^<---5, (III.135) 

log 6 2 ’ ^ ^ 

and , 

(III.136) 

log 6 2 ’ ^ ^ 

for all 6 > 6 i, 6 G H. Based on (III.134) and (III.136), there exists 5 > 0, 62 > &i, and 
an inhnite subsequence B such that 


_1 _ loge), ^ 

log 6 2 2 


(III.137) 
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and 


log 6 2 

for all b > b 2 ,b G B. 


From (III. 137), we obtain that 




log 4 

log 6 



(III.138) 


(III.139) 


which contradicts (III. 138). 

Thus, the conclusion follows for the hrst part of the theorem. 

Next, we prove the second part of the theorem. Since eb = m/Vb, there exists 
a 6i G N such that eb < e™®'"". Since we are concerned with the asymptotic rate of 
decay of error bound ^ -^yg only need to consider the case where 

eb < e™®"". Thus, substituting eb = = m/y/b and ab as in (III.123) into (III.119), 

we obtain that 


subgrad 


2 my/b 




log 




+ 1 


Vb 


am 


log c 

am'^'d^-” /log (egiaxy^ 


m m 


s" + ^ + 


2 m 


2 m 


logc 


. 1 3^ 

+ 1 I+- 


y/b 2y/b 

. (III.140) 


Taking logs on both sides of (III. 140), we obtain that 


log e\ 


subgrad _ 


log + log 


am^ .s"‘ am 

+ 






2 m 


2 m 


log c 


1 I 3^ 


(III.141) 
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We consider the second term in (III.141), and obtain that 


lim log 

b—^oo 


am 


2 m 2m log c / 2 


lim log 

b^oo 


am^-'d ^-”/log 


/ 


’^^/b 


2 m 


logc 


+ 1 


ara°' i 3m 

2m “^2 


am^ 

2m 


log 


logc 


+ 1 


+ 1 


yj 

(III.142) 


Based on the continuity of the log function, and the fact that 


lim 

b—^oo 


am°'' (p” _l_ 3m 

2m '^2 


log 




= 0 , 


2m 


logc 


+ 1 


(III.143) 


the right-hand side of (III. 142) simplihes to 

am^-'d '^"/log 


lim log 

b^oo 


2 m \ log c 
From (III. 141) and (III. 144), we obtain that 


+ 1 






/ log 

1 subgrad Incr d_ ^ 1 logc 

hm ^"g/V =lim^+hm ^ 


+ 1 


b^oo log b b^oo log b b^oo 


log b 


am^ s‘‘ 


(III.144) 


(III.145) 


as limfe^oo — liiFii- - ~ ^PP^yi'^g L’Hopital’s rule on the second term of the 

right-hand side, we obtain that 


lim 

b^oo 


/ log 

log I — logc 




+ 1 




log b 


= lim 

b^oo 




m log c I I 20^1^^53/2 
^max^ / \ 0 


1/6 


= lim 

b^oo 


log 




logc 


+ 1 / 


2 logc 


= 0, (III.146) 
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and the conclusion follows. 


□ 


We see from Theorem III.29 that a certain choice of the parameters eb, e'^,, and 
ab, b eN, results in an asymptotic rate of decay of error bound jf 

compare against the fastest rate of decay for a discretization algorithm of we 

see that for moderate and large m, discretization algorithms may not be competitive 
against Algorithm III.3. This difference in the rate of decay of error bound is observed 
in the numerical results in the next section. 

D. NUMERICAL RESULTS 

In this section, we provide numerical evidence to validate some of the key 
theoretical results obtained in Sections III.B and III.C. Proposition III.7 indicates 
that the asymptotic rate of decay of error bounds for discretization algorithms is no 
faster than ag 5 oo_ We compare that with the e-subgradient algorithm, 

Algorithm III.3, where Theorem III.29 indicates that a rate of 6“^/^ as 6 —)■ cxd is 
attainable. The dependence on m, the uncertainty dimension for the discretization 
algorithms, implies that under certain convexity-concavity assumptions, discretiza¬ 
tion algorithms will likely not be competitive against e-subgradient algorithms for 
semi-inhnite problems with high uncertainty dimension. In this section, we provide 
some indication on the range of values of m where discretization algorithms are not 
competitive with e-subgradient algorithms for convex-concave problems. 

From Theorems III.9 and III.11, the asymptotic rate of decay of error bound 
for a discretization algorithm that uses a quadratically convergent algorithm map is 
the same as that of a linearly convergent algorithm map. In this section, we examine 
if there is any numerical difference between a superlinearly convergent algorithm map 
and a linearly convergent algorithm map. 

We compare the following algorithms over a set of problem instances from 
Rustem and Howe (2002): 



(i) Algorithm III.l with e-PPP (an active-set version of PPP as stated in Algorithm 
2.4.34 in Polak 1997; see also Polak 2008) as the algorithm map. 

(ii) Algorithm III.l with SQP-2QP (Algorithm 2.1 of Zhou & Tits 1996, an SQP 
algorithm with two QPs) as the algorithm map. 

(iii) The e-subgradient algorithm, Algorithm III.3. 

The first two algorithms are discretization algorithms, while the third is an e-sub- 
gradient algorithm. We refer to Appendix D for details on the algorithms and the 
algorithm parameters used. 

We use Problems 1 and 5 from Rustem and Howe (2002, pp. 100-102), which 
are two- and three-dimensional in y, respectively. We also modify Problem 1 to 
create a one-dimensional (in y) problem instance. We call the three problem instances 
SProbA, SProbB, and SProbC, in increasing order of ^-dimensionality; see Appendix 
C for details. 

Similar to Chapter II, we implement and run all algorithms in MATLAB 
version 7.7.0 (R2008b) (see Mathworks, 2009) on a 3.73 GHz PC using Windows XP 
SP3, with 3 GB of RAM. All QPs are solved using TOMLAB CPLEX version 7.0 
(R7.0.0) (see Tomlab, 2009) with the Primal Simplex option. In Step 2 of Algorithm 
HI.3, we use TOMLAB SNOPT version 7.2-5 (see Gill, Murray, & Saunders, 2007) 
to hnd the ^/-maximizer. 

In Ghapter H, we consider problem instances with uncertainty dimension of 
one, and we use discretization parameters N in the order of 10® to achieve reasonable 
solution tolerance. In all the hnite minimax algorithms considered (including SQP- 
2QP and e-PPP), one of the steps is to compute the function values at all the grid 
points at the current iterate. In Ghapter H, we implement the function evaluation 
step using vector operations on all the grid points in a single line of code, instead 
of “looping” through each grid point, to ensure better efficiency. In this chapter, we 
consider problems with uncertainty dimensions higher than one, and the discretization 
parameters required to achieve reasonable solution tolerance increase to orders of 10® 
and above. This requires too much memory if the same implementation as that in 
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Chapter II is used. Thus, we use “looping,” evaluating subsets of the grid points in 
each loop. The different implementation is applied for SProbB [m = 2) and SProbC 
{m = 3), since the original more efficient implementation still works for SProbA 
(m = 1). Note that this issue does not affect e-subgradient algorithms as they do not 
deal with grid points. 

We report run times to achieve a solution x that satisfies 

||a;_a;t-get|| (III.147) 

where the error tolerance t = 10“^, 10“^, 10“^, 10“^, 10“^, and is a target solu¬ 

tion (see Table 18 in Appendix C) obtained by Algorithm III.3. We refer to Appendix 
C for details on the procedure to obtain Algorithm III.3 is chosen for these ver- 

ihcation analysis as preliminary experiments show that it is signihcantly more efficient 
than the other two algorithms, especially for problems with uncertainty dimension 
m > 2. Although the termination criterion (III. 147) is not possible for real-world 
problems, as x*®''’®®* is usually unknown beforehand, we find that it is the most useful 
criterion in this study. 

1. Problem Instance of Uncertainty Dimension One 

Table 8 summarizes the run times (in seconds) of Algorithm III.l with e-PPP, 
for various discretization parameter N/, across the top row, to achieve various error 
tolerances t listed in the hrst column. Run times in boldface indicate the particular 
discretization parameter Nb that produces the shortest run time for the specihc error 
tolerance t. An asterisk * in the table indicates that the particular discretization 
parameter is insufficient to achieve the desired error tolerance. For example, in Table 
8 with Nb = 1,000, we observe that the iterates do not change after a certain time 
(within six hours), and the required error tolerance of t = 10“^ has not been met. 
A double asterisks ** indicate that the algorithm failed to satisfy the required error 
tolerance after six hours. Preliminary experiments show that Algorithm III.3 produces 
run times no slower than ten seconds for all problem instances considered. Thus, 
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we choose an arbitrary maximum run time of six hours (signihcantly longer than 
ten seconds). As mentioned, the MATLAB implementation for e-PPP on SProbA 
computes the function values in a single line of code. As there is insufficient memory 
to store the function values of all Ni, =100,000,000 functions, that leads to the memory 
issues as indicated by “mem” in Table 8. 


t\Nb 

100 

1,000 

10,000 

100,000 

1,000,000 

10,000,000 

100,000,000 

10-1 

0.83 (7) 

0.57 ( 7 ) 

0.66 (7) 

1.6 (7) 

11.4 (7) 

107.6 (7) 

mem 

10-1= 

0.84 (10) 

0.77 ( 10 ) 

0.98 (10) 

2.2 (10) 

15.8 (10) 

149.0 (10) 

mem 

10-=1 

* 

2.2 (13) 

1.5 ( 12 ) 

3.9 (12) 

22.4 (12) 

207.1 (12) 

mem 

10-4 

* 

* 

4.5 ( 15 ) 

9.1 (15) 

36.7 (14) 

334.7 (14) 

mem 

10-5 

* 

* 

* 

** 

** 

** 

mem 


Table 8. Run times (in seconds) for SProbA using Algorithm III.l with e-PPP. The 
numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance, while a double asterisk ** indicates that (III. 147) is not satished after six 
hours. The word “mem” means that the algorithm terminates due to insufficient 
memory. 


Table 9 summarizes the run times of Algorithm III.l with SQP-2QP. The faster 
run times in Table 9 compared to Table 8 are due to the superlinear rate of convergence 
of the SQP-2QP algorithm map compared to the linear rate of convergence of e-PPP. 

We see from Tables 8 and 9, as well as subsequent run times for SProbB and 
SProbC that the discretization parameter Nb that produces the fastest run times, 
varies between problems and tolerances. Thus, it is difficult to determine the “right” 
discretization parameter to use. 

Table 10 summarizes the run times of Algorithm III.3 for SProbA. Comparing 
the run times for the three algorithms (ignoring the issue that discretization param¬ 
eters are difficult to determine), we see that the discretization algorithms (Tables 8 
and 9) are generally competitive against the e-subgradient algorithm (Table 10) for 
problems with uncertainty dimension of one. 
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t\Nb 

100 

1,000 

10,000 

100,000 

1,000,000 

10,000,000 

100,000,000 

10-1 

0.12 (5) 

0.14 (5) 

0.11 (5) 

0.51 (5) 

4.4 (5) 

42.6 (5) 

mem 

10-^ 

0.17 (6) 

0.18 (6) 

0.12 (6) 

0.59 (6) 

5.1 (6) 

48.8 (6) 

mem 

10-^ 

* 

0.15 (7) 

0.13 (7) 

0.65 (7) 

5.7 (7) 

55.1 (7) 

mem 

10-4 

* 

* 

0.13 (8) 

0.81 (8) 

6.4 (8) 

61.4 (8) 

mem 

lO-i* 

* 

* 

* 



* 

mem 


Table 9. Run times (in seconds) for SProbA using Algorithm III.l with SQP-2QP. 
The numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance. The word “mem” means that the algorithm terminates due to insufficient 
memory. 


t 

10-1 

10 -^ 

10 -^ 

o 

1 

o 

1 

Cjl 

Rnn times 

0.18 (2) 

0.25 (3) 

0.36 (4) 

0.34 (5) 0.45 (6) 


Table 10. Rnn times (in seconds) for SProbA using Algorithm III.3. The numbers in 
parentheses indicate the number of iterations. 

2. Problem Instance of Uncertainty Dimension Two 

Tables 11-13 snmmarize the run times for SProbB. The discretization param¬ 
eter Nh is chosen such that G N. The run times for the two discretization 

algorithms are generally an order of magnitude slower than those for SProbA, while 
the rnn times for Algorithm III.3 are still within the same order of magnitude. These 
results provide some validation to the rate of decay of error bonnd obtained 

for the two discretization algorithms in Theorems III.9 and III. 11, and the 
for Algorithm III.3 in Theorem III.29. 

3. Problem Instance of Uncertainty Dimension Three 

Tables 14-16 summarize the run times for SProbC. We see more evidence 
of the independence of the rate of decay of error bonnd on m for Algorithm III.3. 
Specihcally, the ratio of run times for Algorithm III.3 to attain error tolerances of 
10“^ and 10“^ are 0.45/0.18 = 2.5 (SProbA where m = 1), 1.1/0.21 = 5.2 (SProbB 
where m = 2), and 4.0/1.6 = 2.5 (SProbC where m = 3). These ratios provide some 
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AN, 

1,024 

10,000 

100,489 

1,000,000 

10,004,569 

100,000,000 

1,000,014,129 

10-1 

2.5 (5) 

1.5 ( 5 ) 

4.9 (5) 

29.6 (5) 

281.9 (5) 

2720 (5) 

** 

10-2 

* 

2.0 ( 6 ) 

6.2 (7) 

39.6 (7) 

380.8 (7) 

3605 (7) 

** 

10-^ 

* 

* 

* 

49.9 ( 9 ) 

535.1 (9) 

4720 (9) 

** 

10-1 

* 

* 

* 

49.4 ( 9 ) 

670.1 (10) 

6899 (11) 

** 

lO-'i 

* 

* 

* 

* 

* 

** 

** 


Table 11. Run times (in seconds) for SProbB using Algorithm III.l with e-PPP. The 
numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance, while a double asterisk ** indicates that (III. 147) is not satished after six 
hours. 


ANb 

1,024 

10,000 

100,489 

1,000,000 

10,004,569 

100,000,000 

1,000,014,129 

10-1 

0.72 (4) 

0.46 ( 4 ) 

2.3 (5) 

12.4 (4) 

134.4 (5) 

1146 (4) 

10879 (5) 

10-2 

* 

0.53 ( 5 ) 

2.5 (6) 

14.4 (5) 

155.2 (6) 

1346 (5) 

12475 (6) 

10-3 

* 

* 

* 

16.7 ( 6 ) 

175.6 (7) 

1532 (6) 

13881 (7) 

10-1 

* 

* 

* 

* 

* 

1719 ( 7 ) 

15178 (8) 

10-3 

* 

* 

* 

* 

* 

* 

* 


Table 12. Run times (in seconds) for SProbB using Algorithm III.l with SQP-2QP. 
The numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance. 


t 

o 

1 

o 

1 

to 

o 

1 

CO 

O 

1 

o 

1 

cn 

Run times 

0.21 (5) 0.35 (7) 0.40 (8) 0.71 (9) 1.1 (9) 


Table 13. Run times (in seconds) for SProbB using Algorithm III.3. The numbers in 
parentheses indicate the number of iterations. 
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validation to Theorem III.29, which states that the asymptotic rate of decay of error 
bound for Algorithm III.3 is which is independent of m. 

For discretization algorithms, we see that the increase is strongly dependent 
on m. For Algorithm III.l with e-PPP, the ratio of run times to attain error toler¬ 
ances of 10“^ and 10“^ are 4.5/0.57 = 7.9 (SProbA), 49.4/1.5 = 32.9 (SProbB), and 
> 21,600/1.4 = 15,000 (SProbC). For Algorithm III.l with SQP-2QP, the ratio of 
run times to attain error tolerances of 10“^ and 10“^ are 0.13/0.11 = 1.2 (SProbA), 
1,719/0.46 = 3,737 (SProbB), and > 21,600/0.81 = 26,667 (SProbC). These ob¬ 
servations indicate that the additional computational work to achieve smaller errors 
increases as m increases, which again provides validation to Theorems III.9 and III.11, 
which states that the asymptotic rate of decay of error bounds for Algorithm III.l 
with e-PPP and SQP-2QP are 

Theorems III.9 and III.11 state that the error bounds for the discretization 
algorithms with a quadratically and linearly convergent algorithm map decay at the 
same asymptotic rate of ag 5 oo_ Wg observe generally faster run times for 

Algorithm III.l with SQP-2QP (superlinear) as compared to e-PPP (linear), which 
shows that we are not in asymptotic regime yet. 


t\Ni, 

1,000 

10,648 

103,823 

1,000,000 

10,077,696 

100,544,625 

1,000,000,000 

10-^ 

1.4 ( 5 ) 

3.0 (4) 

13.3 (4) 

74.4 (4) 

491.3 (4) 

3706 (4) 

** 

10-2 

* 

* 

20.8 ( 7 ) 

113.0 (7) 

860.5 (8) 

6017 (7) 

** 

10"^ 

* 

* 

* 

** 

2393 ( 13 ) 

10546 (10) 

** 

10-^ 

* 

* 

* 

** 

** 

** 

** 

10"^ 

* 

* 

* 

** 

** 

** 

** 


Table 14. Run times (in seconds) for SProbC using Algorithm III.l with e-PPP. The 
numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance, while a double asterisk ** indicates that (III. 147) is not satished after six 
hours. 
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AW 

1,000 

10,648 

103,823 

1 , 000,000 

10 , 077,696 

100 , 544,625 

1 , 000 , 000,000 

10 -^ 

0.81 ( 5 ) 

2.2 ( 5 ) 

11.3 ( 5 ) 

59.6 ( 5 ) 

429.9 ( 5 ) 

4871 ( 5 ) 

** 

10-2 

* 

* 

12.9 ( 6 ) 

69.5 ( 6 ) 

570.7 ( 6 ) 

5658 ( 6 ) 

** 

10 "^ 

* 

* 

* 

* 

665.7 ( 8 ) 

7052 ( 8 ) 

** 

10 -^ 

* 

* 

* 

* 

* 

* 

** 

10 "'^ 

* 

* 

* 

* 

* 

* 

** 


Table 15. Run times (in seconds) for SProbC using Algorithm III.l with SQP-2QP. 
The numbers in parentheses indicate the number of iterations. An asterisk * indicates 
that the particular discretization parameter is insufficient to achieve the desired error 
tolerance, while a double asterisk ** indicates that (III. 147) is not satished after six 
hours. 


t 

10-1 

o 

1 

to 

O 

1 

CO 

O 

1 

O 

1 

cn 

Run times 

1.6 (13) 

1.8 (21) 2.7 (29) 3.9 (37) 4.0 (44) 


Table 16. Run times (in seconds) for SProbC using Algorithm III.3. The numbers in 
parentheses indicate the number of iterations. 

E. CONCLUSIONS FOR SEMI-INFINITE MINIMAX 

This chapter focuses on the discretization approach to solve unconstrained 
semi-inhnite minimax problems. We develop and compare rate-of-convergence results 
for various hxed and adaptive discretization algorithms, as well as an e-subgradient 
algorithm. We present a novel way of expressing rate of convergence, in terms of 
computational work instead of the typical number of iterations. We show that a hxed 
discretization algorithm can achieve the same asymptotic convergence rate attained 
by an adaptive discretization algorithm. We also show that under certain convexity- 
concavity assumptions, the rates of convergence for discretization algorithms depend 
on the uncertainty dimension, while the rate of convergence for an e-subgradient 
algorithm is independent of the uncertainty dimension. This indicates that under 
convexity-concavity assumptions, discretization algorithms are not likely to be com¬ 
petitive with e-subgradient algorithms for problems with large uncertainty dimension. 
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Numerical results show that for convex-concave problems, discretization algorithms 
are not competitive with e-subgradient algorithms for problems with uncertainty di¬ 
mension as small as two. 
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IV. SEMI-INFINITE MIN-MAX-MIN 

PROBLEM 

A. INTRODUCTION 

In this chapter, we consider a generalized semi-hnite min-max-min problem of 
the form 

(GMXM) (IV.l) 

x£X 

where —)■ M is dehned by 

= max min (j)(x,y,z), (IV.2) 

y&Y z&Z(x,y) 

X C and Y C M*” are compact sets, the set-valued function Z : x M™' —)■ 2®“ 

is continuous (see Section 5.3 of Polak, 1997 for a dehnition on the continuity of a 
set-valued function) as well as compact- and nonempty-valued on X x Y, and for all 
{x,y) G X xY and x G Z(x,y), 0(-, •, •) is continuous at {x,y,z). In particular, we 
focus on the special case where Z(-, •) is a constant set Z C M®, but also deal with the 
generalized case. Throughout the chapter, we refer to the case with constant 

set Z as the constant Z case, and the case of the set-valued function Z(-,-) as the 
variable Z case. We denote the semi-inhnite min-max-min problem for the constant 
Z case by (SMXM). Also, we refer to (SMXM) and (GMXM) collectively as (MXM) 
for brevity. 

Applications involving min-max-min optimization include floorplan sizing in 
electronic circuit boards (Ghen & Fan, 1998), obstacle avoidance for robots (Kirjner- 
Neto & Polak, 1998), optimal design centering, tolerancing and tuning problem (Tits, 
1985), geometric facility location problem (Gardinal & Langerman, 2006), and net¬ 
work interdiction problem (Martin, 2007), of which we will give an example. 

The problem (MXM) is difficult to solve due to the layers of min and max 
operators, and, as shown in Ralph and Polak (2000), may not have directional 
derivatives even when ((){■, •, •) is smooth. This implies that dehning suitable optimal- 
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ity conditions is difficult. These difficulties have resulted in a rather limited literature 
on (SMXM), and so far, there is no solution approach for (GMXM). 

Ralph and Polak (2000) propose an approach that deals with (SMXM), mainly 
with X = The assumptions are (i) X, Y, and Z are compact, and (ii) for all 
{y,z) E Y X Z, ((){■, y,z) is continuously differentiable on X. The authors first dis¬ 
cretize Y and Z into Y^ and Zm, where iV, M G N are the cardinality of Y^ and 
Zm, respectively. Then a master algorithm is used to solve sequences of discretized 
min-max-min problems of increasing level of discretization. The hnite min-max-min 
algorithm map used to solve the discretized problems (within the master algorithm) 
applies a method that combines an Armijo-type line search and a trust region ap¬ 
proach. The authors then discuss how an exact penalization method can be used 
to eliminate constraints dehning X, if any are present. The main challenge in the 
approach is, in each iteration of the algorithm map, we need to solve linear pro¬ 
grams to determine the search direction. As noted in the paper, this is expected to 
be a highly computationally intensive task. There are no numerical results in this 
paper. 

In Ralph and Polak (2000), we hnd another approach for (SMXM) with 
X = where the same assumptions and initial discretization step in Ralph and 
Polak (2000) are used. The author then applies exponential smoothing (as described 
in Chapter II) to the innermost minimization problem to obtain a hnite minimax 
problem. The algorithm proposed also consists of a master algorithm that solves se¬ 
quences of the hnite minimax problems with increasing level of discretization, using 
the Pshenichnyi-Pironneau-Polak (PPP) minimax algorithm map. The main chal¬ 
lenge in this algorithm is, as the level of discretization increases, there are more 
functions in Zm- In order to keep the smoothing error small, the smoothing pa¬ 
rameter needs to increase, which may lead to ill-conditioning. Again, there are no 
numerical results in this paper to provide any hint on the numerical performance of 
the algorithm. 
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This chapter proposes a novel approach to solve (MXM). We assume that (i) 
X and Z are compact and convex, and (ii) for all y ^ Y, is continuously 

differentiable and convex on X x Z. We discretize Y into Y^ to obtain a discretized 
min-max-min problem, which we then reformulate into a discretized min-min-max 
problem of larger dimensionality. Finally, we observe that the discretized min-min- 
max problem can be interpreted as a constrained hnite minimax problem. Under 
our convexity assumptions, we show that for any iV G N, if we solve the constrained 
hnite minimax problem, we obtain a global minimizer of the discretized min-max- 
min problem. And if the level of discretization N increases to inhnity, the points 
constructed approach the global minimizer of (MXM). The algorithms in Ralph and 
Polak (2000) and Polak (2003) do not guarantee convergence to a global minimizer 
even under our convexity assumptions. The main challenge in our approach is the 
size of the constrained hnite minimax problem constructed, which has N functions 
and d + Ns variables, where d and s are the dimensionality of X and Z, respectively. 

We hnd similar conversion from a min-max-min problem to a min-min-max 
problem in Martin (2007) for the case with binary variables in the outer min-max, 
where the min-min-max problem provides a lower bound on the optimal objective- 
function value. Another possible way of converting a min-max-min problem into a 
min-min-max problem is to use von Neuman’s minimax theorem (see for example. 
Theorem 5.5.5 of Polak, 1997), but this requires that the sets Y and Z be compact, 
convex, and constant, and for all a; G X and z E Z, ^(a;, -jz) is concave on U, and 
for all a; G X and y eY, ^(a;, y, •) is convex on Z. 

The next section shows that (MXM) arises in network interdiction problems. 
Section C outlines a new approach to solve (MXM). We obtain some numerical results 
in Section D by applying the approach on a network interdiction problem. Section E 
concludes the chapter. 


109 



B. DEFENDER-ATTACKER-DEFENDER EXAMPLE 

In this section, we describe a defender-attacker-defender (DAD) network in¬ 
terdiction problem. We provide two different formulations for the problem, one in 
the form of (SMXM) and another in the form of (GMXM). 

We consider a network G with node set V and arc set E. A node u ^ V can 
provide nonnegative supply up to a maximum of ubsupplt/u at the cost of costsupplt/u 
per unit of supply. Node u requires also a given demandu of supplies. A defender first 
decides how much supply to place at node m, which we denote by SUPPLYu- 

Second, an attacker decides on the quantity of sorties, SORTIEu^v to attack 
each arc {u, v) G E, subject to a maximum of totalsorties, with the intent to maximize 
the defender’s cost to be dehned later. We use an exponential damage function; see 
for example Nugent (1969); Capps (1970), to model the capacity reduction of an arc 
that is attacked. We consider SORTIEu^v as a continuous variable as we assume that 
aircraft carry bomb loads that can be distributed in any way over several arcs. Note 
that our proposed approach can handle integer restrictions on the decision variables 
for the maximization in (MXM), which is SORTIEu^v in the DAD problem. 

Third, the defender sends flow of supplies between nodes in an attempt to 
meet demand. The parameters IbfloWu^v and ubfloWu^y represent the lower and upper 
bounds on the flow across arc {u,v) before the attack, and vulcapu,v represents the 
amount of capacity vulnerable to attack for arc {u, v). Based on Nugent (1969); Capps 
(1970), the remaining capacity of arc {u,v) after the attack is: 

ubflow^ .^ — vulcapy^ y [1 — exp{—vulu,vSORTIEu,v)], (IV.3) 

where vulu^v represents the vulnerability of arc {u,v). A larger value of vulu,v repre¬ 
sents that the arc is more vulnerable to attacks. The vulnerability parameter vulu^^ 
indicates the efficiency of a sortie against arc {u,v). 

The objective function in the problem is the sum of (i) the cost to place supply 
at nodes and (ii) the cost to send flow through the network after the attack to satisfy 
demands. We model the nonlinear effects of congestion on the cost of sending flow 
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(see for example p. 651 of Ahuja, Magnanti, & Orlin, 1993) across arc {u,v) by 

costflow^^^FLOWu,v 

ubflow^.^ — FLOWu,v + e’ 

where e > 0 is a small number to ensure that the cost of flow remains bounded as 
FLOWu^v approaches ubfloWu,v 

In this problem, we assume perfect information, i.e., both the defender and 
attacker know the full characteristics of the network in terms of bounds on the flow 
on each arc, the vulnerability of each arc, etc. We also assume that the defender 
knows the maximum sorties that the attacker can launch, the attacker knows where 
the supplies are placed before launching the sorties, and dually, the defender knows 
the remaining capacity of all the arcs in the network before sending flow to satisfy 
the demands. 

We provide the formulation in both forms of (SMXM) and (GMXM) next, with 
detailed explanation following the model descriptions. In (SMXM), the feasible region 
of the inner minimization problem must be independent of the decision variables for 
the outer minimization and the maximization parts. Hence, the capacity and balance 
of flow constraints are accounted for, by using penalty terms in the objective function. 
In the case of (GMXM), capacity and balance of flow are imposed as constraints. 
Indices 


ueV 
{u,v) e E 

Data 


costsupplt/u 

C0Stfl0Wu,v 

demandu 

e 

lbfl0Wu,v 

penbal 

pencap 

totalsorties 


node (alias v) 

arc directed from node u to node v 


cost to place supply at node u 

cost coefficient for flow between nodes u and v 

demand at node u 

a small number that ensures bounded flow cost 
nonnegative lower bound on flow on arc {u, v) 
penalty parameter for violation of flow balance constraints 
penalty parameter for violation of capacity constraints 
total number of attacker sorties the attacker can fly 
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ubfloWu,v 

ubsupplt/u 

vulcapu^v 

Decision Variables 


upper bound on flow on arc (m, v) 
upper bound on supplies at node u 
vulnerability of arc (m, v) 

amount of capacity of arc (m, v) vulnerable to attack 


EXCESSSUPPLYu 

EXTRASUPPLYu 

PLOW^,, 

SORTIE^^^ 

SUPPLY^ 


recourse supply removed from node u 
recourse supply placed at node u 
flow from node u to node v 
number of sorties to attack arc {u, v) 
amount of supply to be placed at node u 


We denote the vector that contains all the components PLOWu^v, {u,v) E E hj 
PLOW, and similarly for EXCESSSUPPLY, EXTRASUPPLY, SORTIE, and SUP¬ 
PLY. 


DAD Formulation of the form (SMXM): constant Z case 


min > costsupply., SUPPLYu + > 

FLOW ^ ^ 


costfiow^ yFLOWu,v 


uGV 


-E penbal^ 

u£V 


ubflow^^^ — FLOWu,v + e 
2 


{u,v)^E 

E FLOW.a,v - E flow.,, 

v.{u,v)^E v:{v,u)^E 

— SUPPLY u + demandu 


max < 
SORTIE 


mm < 
SUPPLY 



1 

r 0, i 



max < 

FLOWu.v — ubflow,,,+ , 


{u,v)£E 

1 

[ vulcap,,,[l — exp{—vul,,,SORTIEu,v)] J 



2 > 


s.t. < FLOWu,v < ubflow.^^.^ \/{u,v) £ E 


s.t. SORTIEu,v < totalsorties 

(u,v)£E 

SORTIEu,v > 0 V(w,h') G E 


s.t. 0 < SUPPLYu ^ ubsupply^ \/u £ V 
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DAD Formulation of the form (GMXM): variable Z case 


min < 
SUPPLY I 


max < 
SORTIE I 


min 

FLOW, EXCESSS UPPLY, 
EXTRASUPPLY 


E 

u£V 


costsupply^SUPPLYu + 


E 

{u,v)£E 


costflow,,^,^FLOWu,v 


'u6/?oui^ „ — FLOWu,v + e 


+ penbal,,^EXCESSSUPPLYu + Y penbal,^EXTRASUPPLY„ 

uGV u£V 


s.t. E FLOWu,v - E FLOWv.u = SUPPLY^ - demand^ 

v.{u,v)^E v.{v ,u)^E 


- EXCESSSUPPLY^, + EXTRASUPPLY,, Vu £ V 


Z6/?oui^ „ < FLOWu,v 

< .j, — vulcap,,^ ,^ [1 — exp{—vul„^„SORTIE„^A\ V(u,v) e E 

EXCESSSUPPLY„ >0 Vn e V 
EXTRASUPPLYu >0 Vn e V 
s.t. SORTIEu,v < totalsorties 

(u,v)GE 

SORTIE,,^,, > 0 V(n,'!;) S E 


s.t. 0 < SUPPLYu < ubsupply„ Vn S V 


> 


Discussion 

The main differences between the two formulations he in the way that the 
capacity and balance of flow constraints are modeled. In the variable Z case, we model 
them explicitly. The dummy variables EXCESSSUPPLY and EXTRASUPPLY are 
included to ensure that the model remains feasible even if the sorties have reduced 
the capacity of the network to a level where (i) excess supply cannot flow out from a 
node or (ii) demand at a node is not fully satished. And if that happens, we penalize 
the violation based on (i) the excess supply that needs to be removed or (ii) the extra 
supply required to satisfy demand fully. 

For the constant Z formulation, we model the capacity and balance of flow 
constraints by including penalty cost terms in the objective function. This allows 
us to have the inner constraint set being a constant set dehned by 0 < „ < 

PLOWu,v < for all arcs {u,v) G E. 

The objective functions for both cases express the sum of (i) the cost of placing 
supply at supply nodes, (ii) cost of sending flow through the network after the attack 
to satisfy demands, considering congestion effects, and (iii) penalty terms for violation 
of capacity or balance of flow constraints. The other constraints are self-explanatory. 

We note that if we view SORTIE^ ,,, as constants instead of decision variables, 
the objective functions and feasible sets in both formulations are convex, and the 
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feasible region is defined by linear functions and box constraints. We will revisit this 
issue when we discuss our new approach to solve (MXM). 

C. APPROACH TO SOLVE THE MIN-MAX-MIN PROB¬ 
LEM 

In this section, we propose an approach to solve (MXM) by constructing a 
constrained hnite minimax problem. We use this hnite minimax problem to obtain 
an approximation to a global minimizer of (MXM) under certain assumptions on the 
algorithm used to solve the hnite minimax problem. 

Before we describe the approach to solve (MXM), we state two assumptions 
for Z that will be used repeatedly throughout the chapter. Assumption IV. 1 will be 
used when we consider the constant Z case, while Assumption IV.2 will be used when 
we consider the variable Z case. 

Assumption IV.1. X C W’', Y C MX, and Z C are compact sets, and 0(-, •, •) is 
continuous on X x Y x Z. 

Assumption IV.2. X C and Y C are compact sets, the set-valued function 
Z : X 2®"* is continuous as well as compact- and nonempty-valued on XxY, 

and for all {x,y) E X x Y and z E Z{x,y), ■, ■) is continuous at {x, y, z). 

1. Constructing a Finite Minimax Problem 

In this subsection, we construct a hnite minimax problem from (MXM). We 
hrst discretize the set Y C M™' to obtain a discretized min-max-min problem. Next, 
we show that the inner max-min problem is equivalent to a min-max problem. We 
then observe that the min-min-max problem can be interpreted as a hnite minimax 
problem, but with more variables than (MXM). We cover the details next. 

a. Discretized Min-Max-Min Problem 

We introduce the discretized problem for (GMXM): 

(GMXM^) min^jv(a:), (IV.5) 
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where ^ N G N is defined by 


'ipNix) = max min (f){x,y,z), (IV.6) 

y£Ypf z&Z{x,y) 

Y]\f C V, iVvl = iV G N, satisfy the property dist(YAr,V) ^ 0 as iV ^ oo, with 
dist(-, •) being the Hausdorff distance operator defined on p. 65. Under Assumption 
IV.2, (GMXMjv) is well-defined. An example of a discretization scheme that produces 
Vv with the above property is the uniform grid discussed on p. 65 for the case when 
V is a hyper-box. We denote the equivalent discretized problem for (SMXM) by 
(SMXM^). 

We need the following notation for subsequent analysis of (GMXM): 


u{x,y)= min (j){x,y,z), 

zez{x,y) 

(IV.7) 

Z{x, y)= [ze Z{x, y) \ 4>{x, y, z) = uj{x, y)} , 

(IV.8) 

Y{x)= {y eY \ uj{x, y) = %l){x)} , 

(IV.9) 

= {y eYn \ uj{x, y) = iPn{x)} . 

(IV.IO) 


We next show the continuity of the functions a;(-, •), and for 
both the constant and variable Z case. 

Proposition IV. 3. Suppose that Assumption IV.1 holds. Then the functions •), 
and '^Ar(-), V G N are continuous on X xY and X, respectively. 

Proof. We refer to Polak (1997, Section 5.4) for the proof. □ 

Proposition IV. 4. Suppose that Assumption IV.2 holds. Then the functions u{-, •), 
and '^Ar(-), V G N are continuous on X x Y and X, respectively. 

Proof. Based on Gorollary 5.4.2 of Polak (1997), Ci;(-, •) is continuous on X x V. 
Hence, based on Propositions I.l and 1.2, '^(•) and t/’w(‘) are also continuous on 
X. □ 


115 



We next show that the discretized min-max-min problems epi-converge 
to (MXM). We hrst consider the constant Z case. 

Lemma IV. 5. Suppose that Assumption IV. 1 holds. The sequence of problems 
{(SMXM 7 v)}AreN epi-converges to (SMXM) as N ^ oo. 

Proof. The proof follows the same arguments as the epi-convergence proof in Theo¬ 
rem 5.2 of Polak (2003) and is included here for completeness. Suppose that {xatItvsn 
is a sequence in X such that ^ x as N ^ oo, and suppose that i/n G Ypp{x]\f) for 
each N eN. Without loss of generality, we assume that i/n ^ y as N ^ oo. Then 

limsup'^7v(a^Ar) = hm a;(a; at, i/tv) = ^) <'?/’(£), (IV.ll) 

N^oo N^oo 

where we use the fact that •) is continuous; see Proposition IV.3. 

Next, suppose that -ipi^x) = u{x,y*) for some y* G Yn{x). Then since 
dist(yAr, V) — )■ 0 as iV —)■ oo, there exists a y'p^ eYn such that y'pj ^ y*. Hence, 

liminf'^7v(a^Ar) > hm a;(a;Ar,|/)v) =I/*) =(IV.12) 

N—>-oo N—>-oo 

This proves that if xn -E x as N ^ oo, then 'iPn{xn) -E 'ip{x). Based on Proposition 
1.3, the conclusion follows. □ 

Lemma IV. 6. Suppose that Assumption IV.2 holds. The sequence of problems 
{(GMXMAr)}ArgN cpi-converges to (GMXM) as N ^ oo. 

Proof. The proof follows the same arguments as the proof for Lemma IV.5, with 
Proposition IV.4 replacing Proposition IV.3. □ 

We next provide two theorems, which directly follow from the epi- 
convergence of the discretized min-max-min problems to (MXM). Again, we hrst 
consider the constant Z case before the variable Z case. 

Theorem IV. 7. Suppose that Assumption IV. 1 holds. If {xat} is a sequence of 
global minimizers of (SMXMat) and there exists an infinite subset K E N such that 
xjq X as N ^ oo, thenx is a global minimizer of {SMXM), and%l)iq{xjq) 'ip{x) 
as N ^ oo. 
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Proof. The conclusion follows from Lemma IV.5 and Proposition 1.4. □ 

Theorem TV.8. Suppose that Assumption IV.2 holds. If {xjsi} is a sequence of global 
minimizers of (GMXMtv) and there exists an infinite subset K eN such that xjq 
X as N ^ oo, then x is a global minimizer of (GMXM), and 'iPn{xn) — 'ip{x) as 
N —)■ oo. 


Proof. The conclusion follows from Lemma IV.6 and Proposition 1.4. □ 

Theorem IV.7 imply that if we pick a large iV G N, and solve (SMXMjv) 
to obtain a global minimizer a; at, then xjv is an approximation to a global minimizer of 
(SMXM). The same is true regarding Theorem IV.8. As (SMXMtv) and (GMXMat) 
are still difficult problems to solve, we reformulate them into hnite minimax problems 
next. 


b. Equivalent Finite Minimax Problem 

In this subsection, we first introduce a discretized min-min-max prob¬ 
lem that we show is equivalent to the discretized min-max-min (GMXMtv) in some 
sense. We then show that the new min-min-max problem can be seen as a hnite 
minimax problem. To show the equivalence of this new min-min-max problem to 
(GMXMat), we introduce some notational changes to (GMXMat). 

Without loss of generality, we assume that Iat = {l/i, 1 / 2 , •••, l/v}, and 


we re-express 


'0Ar(a:) = max min ipVx,z), 

^ ’ j&N z&Z{x,yP^ ^ 


(IV.13) 


where J\f = {1, 2,..., N}, and the function x W ^ M., j E M, is dehned by 


A 


ip^{x,z) = (j){x,yj,z). 


(IV. 14) 


We now introduce the equivalent problem to (GMXMat): 

(GMMXtv) min^jv(a:), (IV.15) 

x£X 
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where ^ N G N, is defined by 

^n{x)= min m.a.'x (p^ {x, z), (IV.16) 

z&Z^{x) j&N 

(fp : X ^ IR, j G -A/”, is defined by 

^^{x,z) = Lp^{x,Zj), (IV. 17) 

7 = {zf, Z 2 , ...,zjf)’^, Zj G Z{x, Uj) for all j G A/", and Z^{x) = Z{x, yi) x Z{x, 1 / 2 ) X 
... X Z{x,yisf). In order to allow for the exchange of the min and max operators, we 
introduce a variable for each y G Y^. This expands the dimension of by a factor 
of N. We denote the equivalent discretized min-min-max problem for (SMXM) by 
(SMMX^). 

The next result proves the equivalence of the new discretized min-min- 
max problem to the discretized min-max-min problem. We first consider the constant 
Z case. We define Z^ = Z x Z x ... x Z. 

Theorem IV.9. Suppose that Assumption IV. 1 holds. Then for all N ^ N and 
X e X, fjN{x) = fjN{x). 

Proof. For all a; G X and j G Af, since p>^{x, •) is continuous on Z and Z is compact, 
there exists a Zj{x) G Z such that 

<p\x, Zj{x)) = mimp^{x, z). (IV.18) 

z£Z 

We define z{x) = {zi{x)'^, Z 2 {x)'^,..., zn{x)'^)'^. Then 

= vnaxiip^ {x,Zj{x)) 
j&N 

= min max cp^ (x, z) 
zez’^,z=z{x) jeJV 

> min ma,x(p^ {x, z) ='ipN^x). (IV.19) 

zeZ^ j&N 

Next, for all x E X, since max j (p^ {x, •) is continuous on Z^, and Z^ is compact, 

there exists a z{x) G Z^ such that 

iPn{x) =max(p^{x,z{x)). (IV.20) 

j&N 
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Then 


'f’Ni.x) = max {x, Zj{x)) 

= max min Lp^{x,z) 

z=Zj(x),z£Z 

> maxmm {x, z) = 'ipisfix). 

j&N z&Z 

(IV.21) 

The conclusion follows since f^N^x) > fjN^x) and ^Jn^x) '>'iI)n{,x). 

□ 

We observe that the min-min-max problem (SMMXtv) 

is a constrained 

hnite minimax problem of the form 


(FMX^) miuT^H, 

weW 

(IV.22) 

where 


'Fjv(t(;) = max 

(IV.23) 

P{w) = (fi^{x,z), 

(IV.24) 

and w = zf, zj, ..., zJfZ C IF = V x Z^. 



Note that we obtain the simpler hnite minimax problem (FMXat) from 


the discretized min-max-min (SMXMat) at the expense of a larger number of variables, 
i.e., w e 

The results above are next generalized to the variable Z case. 

Theorem IV. 10. Suppose that Assumption IV.2 holds. Then for all N ^ H and 
X e X, ^n{x) = i)N{x). 

Proof. The proof follows the same arguments as the proof for Theorem IV.9, with 
obvious notational changes. □ 

The generalized min-min-max problem (GMMXjv) is also a constrained 
hnite minimax problem, with a form similar to (FMXjv) dehned in (IV.22)-(IV.24), 
except that the set W is replaced by Wx = {(a;, f) G x \ x e X,z e Z^{x)}. 
We denote the constrained hnite minimax problem for (GMXM) as (GFMXtv). 

We next propose an algorithm that produces an approximation to a 
global minimizer of (MXM) by solving the constructed hnite minimax problem. 
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2. Algorithm for Semi-Infinite Min-Max-Min 

In this subsection, under the assumption that there exists a constrained hnite 
minimax algorithm that produces a global minimizer of (FMXtv), we propose an 
algorithm that obtains a point that is close to a global minimizer of (SMXM). We 
describe a constrained finite minimax algorithm that satishes the assumption in the 
numerical section. In this subsection, we only consider the constant Z case, however, 
all the results equally apply to the variable Z case. 

From this point on, we refer to those algorithms that are applied to solve 
(FMXat), iV G N, as algorithm maps, to differentiate them from the overall algo¬ 
rithm for (SMXM). We develop the convergence results of our approach based on a 
constrained hnite minimax algorithm map that satishes the following assumption. 

Assumption IV. 11. Suppose that Assumption IV. 1 holds. Given an N E fl, the 
algorithm map applied to solve (FMX^r) generates a sequence C X x Z^, and 

every accumulation point of that sequence is a global minimizer of (FMXat). □ 

In view of the above results, the following algorithm for (SMXM) is simple. 
Algorithm IV. 1. Semi-Inhnite Min-Max-Min Algorithm 
Parameter: N E N. 

Step 1. Generate a sequence by applying a constrained hnite minimax 

algorithm map that satishes Assumption IV.11 to (FMXat). □ 

The next theorem implies that if we choose a high level of discretization, i.e., 
large N, then from every accumulation point of the sequence generated, we can easily 
construct a point that is a global minimizer of the discretized min-max-min problem 
(SMXMat). Thus, if the level of discretization N increases to inhnity, the points 
constructed approach the global minimizer of the original semi-inhnite min-max-min 
problem (SMXM), due to Theorem IV.7. 

Theorem IV. 12. Suppose that Assumption IV. 1 holds, and that Algorithm IV. 1 is 
applied to solve (SMXM) with a given N eN, and it generates a sequence C 
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X X . If w* = {x*,z*) with X* E X and z* G , is an accumulation point of 
then X* is a global minimizer of (SMXMat). 

Proof. The conclusion follows directly from Theorem IV.9. □ 

D. NUMERICAL RESULTS 

In this section, we apply our approach on a DAD problem with a ten-node 18- 
arc network as shown in Figure 2. The problem parameters, e.g., ubsupplt/u, demandu, 
and totalsorties are obtained by uniform random number generators based on bounds 
that we provide. We set the bounds in such a way that more supply can be placed 
at nodes 1-5 than 6-10, while the demands are higher at nodes 6-10 than 1-5. This 
ensures that we have flow from the left-hand side of the network to the right-hand 
side. We refer to Appendix E for the problem parameters generated and used in this 
study. We use a discretization level of N =1,000, i.e., we consider 1,000 randomly- 
generated attack plans. Each attack plan provides the sorties to launch against the 
18 arcs. 

We solve the constrained finite minimax problem constructed in our approach 
by reformulating it into a standard nonlinear constrained problem and solving it 
using a sequential quadratic program (SQP) algorithm. We implement and run the 
algorithm in MATLAB version 7.10 (R2010a) (see Mathworks, 2009) on a 3.46 GHz 
PC with two quad-core processors, using Windows 7 Pro, with 24 GB of RAM. We 
use the SQP algorithm in TOMLAB SNOPT solver, see Gill et al. (2007). 

For our problem with ten nodes and 18 arcs, and a discretization level N = 
1, 000, (FMXtv) has 1,000 functions and approximately 18,000 variables, and takes ap¬ 
proximately 4.5 hours, while the (GFMX^r)) has 11,000 functions and approximately 
38,000 variables, and takes approximately 1.5 hours. The smaller (FMXat) requires 
a longer run time because there are more nonlinear components in its formulation, 
where the balance of flow and capacity constraints have been modeled as nonlinear 
penalty cost terms in the objective function. 
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We first discuss the results for (SMXM). We refer to Figure 2 for the solutions 
obtained from solving (FMXat). The optimal supply solution and the required de¬ 
mand are stated on the nodes. The supply numbers highlighted in red (specihcally 
those for nodes 6-10) indicate that the proposed supplies are at their ubsupplyu val¬ 
ues. The worst-case attack plan is one that concentrates attack on arcs (4,6) and 
(5,7), see the details on this attack plan in Appendix E. This worst-case attack plan 
is reasonable based on the problem parameters, where more supply can be placed at 
those nodes on the left-hand side of the network, while higher demands are required 
at the nodes on the right-hand side. The optimal flow after the worst-case attack is 
stated on the arcs, and the objective function value is 1,288. 


SUPPLY. 


SUPPLY^ 
(demand^) 


(demand^,) 


4.8 8.6 3.8 4.5 

Obj. value (2.9) (2.5) (8.2) (8.3) 



The proposed supply sums up to 46.0. This is less than the total demand of 
54.5. We develop another optimization model, which we refer to as the verification 
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models to verify if the solution obtained from our approach is reasonable, and to 
determine why the proposed supply is less than the total demand. When given an 
arbitrary SUPPLY, the verihcation model runs throngh all 1,000 attack plans, and for 
each attack plan, determines the optimal flow and associated objective function valne. 
We test the verihcation model with several alternative snpply solntions that snm to 
the reqnired demand of 54.5, and obtain objective fnnction values no smaller than 
approximately 2,300. We conclude that the proposed supply is less than the total 
demand because the sorties have reduced the capacity of the network to an extent 
that any additional supply is unable to how to satisfy any ontstanding demand. Thus, 
we do not gain any beneht by adding snpply, and worse still, we incur the additional 
cost of storing snpply as well as incnr penalty for balance of how constraint violations 
as the additional supply cannot how out. 

We next state the resnlts for (GMXM). We refer to Figure 3 for the solutions 
obtained from solving (GFMXjv). The proposed supply is the same as that proposed 
for (SMXM), except for the smaller supply placed at nodes 1, 4, and 5. The optimal 
how after the worst-case attack is stated on the arcs, and the objective fnnction value 
is 891, see the details on the attack plan in Appendix E. The objective fnnction value 
for (GMXM) is signihcantly diherent from that of (SMXM) as the two problems 
have diherent objective fnnctions. The worst-case attack and the optimal how for 
(GMXM) are signihcantly diherent from that of (SMXM). 

E. CONCLUSIONS FOR SEMI-INFINITE MIN-MAX-MIN 

This chapter focuses on the semi-inhnite min-max-min problem. We propose 
an approach that constrncts a hnite minimax problem with a larger dimensionality 
than the original min-max-min problem, through discretization and reformnlation of 
the original problem. Our approach is the hrst to solve the generalized semi-inhnite 
min-max-min problem, and it also solves the semi-inhnite min-max-min problem. The 
numerical results show that the approach produces reasonable solutions. 
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SUPPLX, SUPPLY^ 


(demand^) (demand,, 



FLOW, 



4.8 7.1 3.8 4.5 

Obj. value (2.9) (2.5) (8.2) (8.3) 



Figure 3. Optimal Supply and Flow Solution for (GMXM). 
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V. CONCLUSIONS AND FUTURE 

RESEARCH 

A. CONCLUSIONS 

Optimization problems with uncertain parameters arise in numerous applica¬ 
tions. One possible approach to handle such problems is to consider the worst-case 
value of the uncertain parameter during optimization. We consider three problems re¬ 
sulting from this approach: a hnite minimax problem (FMX), a semi-inhnite minimax 
problem (SMX), and a semi-inhnite min-max-min problem (MXM). In all problems, 
we consider nonlinear functions with continuous variables. We develop rate of conver¬ 
gence and complexity results, and propose algorithms for solving these optimization 
problems. 

We develop rate of convergence and complexity results of smoothing algorithms 
for solving (FMX) with many functions. We hnd that smoothing algorithms may only 
have sublinear rates of convergence, but their complexity in the number of functions q 
is O(glogg), as compared to O(g^) for the sequential quadratic programming (SQP) 
algorithms, which our numerical results as well as those in the literature show to 
be one of the fastest for solving (FMX). The competitive complexity for smoothing 
algorithms is due to its small computational work per iteration. We present two 
new smoothing algorithms for (FMX) with novel precision-adjustment schemes, and 
show that they are competitive with other algorithms from the literature. They are 
especially efficient for problems with many variables, or where a signihcant number 
of functions are nearly active at stationary points. The new algorithms are easy to 
implement and do not require any QP solver, which is required for algorithms such 
as the SQP and Pshenichnyi-Pironneau-Polak (PPP) minimax algorithm. One of our 
proposed precision-adjustment schemes is simpler and more efficient than the scheme 
used in the existing smoothing algorithms, which provides a good alternative when 
developing new smoothing algorithms. Our numerical results indicate that smoothing 
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with first-order gradient methods is likely the only viable approach to solve a (FMX) 
with a large number q of functions and problem dimensionality d, due to memory 
limitations. The SQP and PPP algorithms need to compute and provide the gradient 
information {q x d matrix) to the QP solver, and so the size of the problem that 
can be solved is limited by the memory required to store the q x d matrix, as well 
as the memory required by the QP solver to process the gradient information. In 
smoothing algorithms, we do not require the memory to store the full qx d matrix, as 
the gradient of the smoothed function can be constructed by sequentially considering 
portions of the gradient matrix. 

For (SMX), we develop and compare rate of convergence results for various 
hxed and adaptive discretization algorithms, as well as an e-subgradient algorithm. 
We present a novel way of expressing rate of convergence, in terms of computational 
work instead of the typical number of iterations, which we use throughout the anal¬ 
ysis of (SMX). Hence, we are able to identify algorithms that are competitive due 
to low computational work per iteration even if they require many iterations. We 
show that to solve (SMX), a hxed discretization algorithm with quadratically or lin¬ 
early convergent algorithm map to solve the discretized problem can achieve the same 
asymptotic convergence rate attained by an adaptive discretization method. Under 
certain convexity-concavity assumptions, we show how the rate of convergence for dis¬ 
cretization algorithms depend on the dimension of the uncertain parameters, while 
e-subgradient algorithms do not. This indicates that, under convexity-concavity as¬ 
sumptions, discretization algorithms will not be competitive against e-subgradient 
algorithms for moderate to large dimension of the uncertain parameters. Our numer¬ 
ical results show that discretization algorithms are not competitive to e-subgradient 
algorithms for convex-concave problems with a dimension of the uncertain parameters 
as small as two. 

We propose a new approach to solve (MXM), based on discretization and re¬ 
formulation of (MXM) into a constrained hnite minimax problem with a larger dimen- 
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sionality than the original (MXM). Our approach is the first to solve (GMXM) in the 
literature, and it also solves (SMXM). We apply our approach to a defender-attacker- 
defender network interdiction problem, and the results demonstrate the viability of 
our approach. 

B. FUTURE RESEARCH 

There are several possibilities for extending the research of this dissertation. 
The two smoothing algorithms developed for (FMX) in this dissertation produce a 
working set that is monotonically increasing. The efficiency of the active-set SQP 
algorithm shows the potential benehts of an aggressive active-set strategy that keeps 
the working set small. However, when we implement the active-set strategy from the 
SQP algorithm in our smoothing algorithms, we see slower run times, which indicates 
that some kind of hne-tuning on the active-set strategy is probably required. Thus, 
an extension would be to custom-£t an active-set strategy for smoothing algorithms. 

Another opportunity for extension concerns the precision-adjustment scheme 
in the smoothing algorithm for (FMX) that requires user-specified parameters. It 
would be worthwhile to develop procedures for rationally selecting these parameters, 
as it is difficult for users to come up with good choices for the parameters. 

We show that the e-subgradient algorithm has better rate of convergence for 
solving (SMX) than discretization algorithm. However, the e-subgradient algorithm 
requires a concavity assumption to ensure that the computational work to obtain 
an e-maximizer (global maximum) for the uncertain parameters remains bounded. 
Without the concavity assumption, it would be interesting to see how the rate of 
convergence results for other algorithms such as the exchange algorithms compare to 
discretization algorithms, since exchange algorithms will also need to implement some 
form of discretization or branch-and-bound techniques to obtain a global maximizer 
for the uncertain parameters. 

Our approach to solve (MXM) constructs a constrained finite minimax problem 
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with a large number of functions and variables. The constructed problem has a 
special structure, each function depends on only a small number of variables, the 
same number as the sum of the number of variables in the innermost and outermost 
minimization problem. It would be useful to develop special hrst-order algorithms 
that utilize this special structure. 
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APPENDIX A. FINITE MINIMAX PROBLEMS 


Table 17 describes the problem instances used for the numerical studies in 
Chapter II. Most columns are self-explanatory. Columns 2 and 3 give the number 
of variables d and functions q, respectively. The target values (Column 7) are equal 
to the optimal values (if known) or a slightly adjusted value from the optimal values 
reported in Polak et al. (2003); Zhou and Tits (1996) for smaller q. The same target 
values are used for ProbA-ProbM in Tables 5 and 6. 

In this appendix, we denote components of a; G by subscripts, i.e., x = 
{xi,X 2 , ■■■,Xd) G When the problem is given in semi-inhnite form, as in (A.2a) - 
(A.2i), the set Y is discretized into q equally spaced points if 


'ip{x) = max 0 ( 0 ;, y), 

yeY 

(A. la) 

and q/2 equally spaced points if 


%Ij{x) = max 0(a;, y)\. 

yeY 

(A. lb) 

ProbA is dehned by (A.la) and (A.2a), and ProbB-ProbI by (A.lb) and (A.2b)-(A.2i), 

respectively. 


<P{x,y) = (2|/^ - l)x + y{l-y){l-x), Y = [0,1], 

(A.2a) 

Hx, y) = {l- y^) - (O.Sa;^ - 2yx), Y = [-1,1], 

(A.2b) 

=y^ - iyxi + X2exp{y)), Y = [0,2], 

(A.2c) 

xiexp{yx2), Y=[ 0.5,0.5], 

(A.2d) 

(j){x,y) = siny- {y'^xs + yx 2 + xi), Y = [0,1], 

(A.2e) 

(j){x,y) = exp{y) , F = [0,1], 

l + yx3 

(A.2f) 
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<P{x,y) = y/y-[xi-{y‘^xi+yx 2 + x^)\ F = [0.25,1], (A.2g) 

1 

(t){x,y) = + X 2 e^Y>{yxA)], F = [-0.5, 0.5], (A.2h) 

(t){x,y) = - [xiexp{yx4^) +X2exp{yx5) + x^expiyxQ)], 

l + y 

y = [-0.5,0.5], (A.2i) 

ProbJ-ProbM are defined by 'ip{x) = maXj^Q (x), with f^{x) as in (A.2j)-(A.2ni), 
respectively. 

f^{x) = x], j = {l,...,g}, (A.2j) 

p(x) = a:Q_i) 2 +i + xlj, j = {1,..., q}, (A.2k) 

f^{x) = + a;^j_4)4+2 + a;(j_4)4+3 + xlj, j = {1, (A.21) 

f^{x) = xl.+xl, j = |l,2,3,..., Q I, (A.2ni) 

where (/cj, Ij) are all 2-conibinations (see Section 3.3 of Brualdi 2004) of {1, 2, 3,..., d}, 
and 

f\x) = ttjxl + bjXi + CjJ = {1, ...,g}, (A.2n) 

where i = ^ , and (ij,bj, Cj are randomly generated from a uniform distribution on 
[0.5,1]. 
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Table 17. Finite minimax problem instances. An asterisk * indicates that the problem instance are created by the 
authors. 
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APPENDIX B. FINITE MINIMAX 
ALGORITHM DETAILS AND PARAMETERS 


PPP. Pshenichnyi-Pironneau-Polak min-max algorithm (Algorithm 2.4.1 in Polak 
1997) use a = 0.5,/3 = 0.8, and 5 = 1. We use the same Armijo parameters a and f3 
for all algorithms. 

e-PPP. e- Active PPP algorithm (Algorithm 2.4.34 in Polak (1997); see also Polak 
2008) use the same parameters as above. We implement the most recent version Polak 
(2008). 

SQP-2QP. Sequential Quadratic Programming with two QPs in each iteration (Al¬ 
gorithm 2.1 of Zhou & Tits 1996) use parameters recommended in Zhou and Tits 
(1996) and monotone line search. (We examined the use of nonmonotone line search 
in CFSQP, but hud it inferior to monotone line search on the set of problem instances.) 
SQP-IQP. Sequential Quadratic Programming with one QP in each iteration (Al¬ 
gorithm A in Zhu et al. 2009) use mid-point values stated in Algorithm A, a = 0.25 
(not the Armijo parameter), r = 2.5, and Hq = I. The same settings for a and Hq 
are used by a co-author in Zhu and Zhang (2005). 

SMQN. Smoothing Quasi-Newton algorithm (Algorithm 3.2 in Polak et al. 2008) 
use po = 1; B{-) = I, and Parameter Adjustment subroutine version “Case (A)” of 
Polak et al. (2003). 

Algorithm II.2. This algorithm uses the same parameters as SMQN, except for in 
the Adaptive Penalty Parameter Adjustment subroutine, where it uses .^ = 2,^ = 2. 
Algorithm II. 3. This algorithm use parameters t = 10~^,(p = l,po = = 

(log q/t) ■ 10^°, K = lO^o, ^ = 2 ,7 = t ■ 10-1°, z/ = 0.5, Ap = 10. 
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APPENDIX C. SEMI-INFINITE MINIMAX 

PROBLEMS 

In this appendix, we denote components of a; G and y ^ Y G M™' by 
subscripts, for example, x = (xi, X 2 , ■■■,Xd)- Two problem instances from Rustem and 
Howe (2002) are used for the numerical studies in Chapter III. The problem (SMX) 
to be solved is as dehned by (Uhl) and (HI.2), with (f>{x,y) as dehned below: 


2 


(j){x,y) 

= b'^x^i-y‘^+ xi{-y + 5)+X2{y + S), 

i=l 

(C.la) 

(j){x,y) 

2 2 

= -^//f + a;i(-//i + 1/2 + 5) +a;2(//i -//2 + 3), 

2 = 1 2=1 

(C.lb) 

(j){x,y) 

= -{xi- l)yi - {X 2 - 2)//2 - (xs - l )//3 + 2x1 + + A 

0 



- Yyl 

(C.lc) 


i=l 


The second (SProbB) and third (SProbC) problem instances are Problems 1 
and 5 on pp. 100-102 of Rustem and Howe (2002), respectively. SProbB and SProbC 
have ^-dimensionality of two and three respectively. There are no problem instances 
in Rustem and Howe (2002) with ^/-dimensionality of one. We create SProbA from 
SProbB by removing y 2 and replacing yi by y. All three problem instances are convex- 
concave, i.e., 0(-, y) is convex for any hxed y eY, and 0(a;, •) is concave for any hxed 
X G As (f>{-,y) is convex for any hxed y, a subgradient is guaranteed to exist, 
which is a pre-requisite for the e-subgradient algorithm. Algorithm HI.3. As (j){x, •) 
is strictly concave for any hxed x, there exists a unique //-maximizer for each hxed x. 

Table 18 provides more details on the problem instances. Columns 2 and 3 give 
the dimensions of the solution space d and the uncertain parameter m, respectively. 
Columns 4 and 5 give the initial points to the solution xq and the uncertain param¬ 
eter yo, respectively. Note that yo is only relevant for the e-subgradient algorithm. 
Algorithm HI.3, as it is not required for the discretization algorithms. 
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For the target solutions (last column), we use Algorithm III.3 (as it shows 
signihcantly faster run times than the discretization algorithms during the preliminary 
experiments) with parameters as in Appendix D, we start with a stepsize of a = 
0.1 and run the algorithm until the solution remains unchanged for more than ten 
iterations, we use this solution to warm-start the next stage where we decrease the 
stepsize to a = 0.01 and repeat the process until a = 10“^. We do not use the optimal 
solutions as reported in Rustem and Howe (2002) as Rustem and Howe (2002) uses 
a different termination criteria. The optimal solutions obtained with our procedure 
agree with those reported in Rustem and Howe (2002) at least to the fourth decimal 
place. 

The other columns are self-explanatory. 
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Target solution 

(-0.49090909, -0.30909091) 

(-0.48333332, -0.31666667) 
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APPENDIX D. SEMI-INFINITE MINIMAX 
ALGORITHM DETAILS AND PARAMETERS 


Algorithm III.l (applies to both e-PPP and SQP-2QP). Given a discretization 
parameter TV G N (|PAr|), the discretization scheme discretize each dimension of y into 
equally spaced points, which gives a total of N grid points. For the numerical 
studies, N are chosen such that iV^/™ are integers. 

Algorithm III.l with e-PPP. The e-PPP algorithm is the same algorithm used in 
Chapter II. The same Armijo parameters a = 0.5,/3 = 0.8, and <5 = 1 are used. The 
e parameter for determining the active set is set at 10“^, which is the value used for 
the algorithm comparison in Chapter II. 

Algorithm III.l with SQP-2QP. The SQP-2QP algorithm is the same algorithm 
used in Chapter II. Similar to Chapter II, we use the parameters recommended in 
Zhou and Tits (1996) and monotone line search. The e parameter for determining the 
active set is set at 1, which is the value used for the algorithm comparison in Chapter 

II. 

Algorithm III. 3. Our preliminary numerical tests show very fast run times for 
Algorithm III. 3 as compared to the other two discretization algorithms. Thus, we 
spent minimal effort in sensitivity analyses to hue-tune the algorithm parameters. 
We use a constant stepsize a = 0.1, which the preliminary tests show to be robust. 
For Step 2 of Algorithm III. 3, we use TOMLAB SNOPT with its default tolerances 
to hud the y-maximizer, and the dual y iterate from the previous major iteration is 
used to warm-start the search for the y-maximizer in the current major iteration. 
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APPENDIX E. SEMI-INFINITE MIN-MAX-MIN 
PROBLEM PARAMETERS AND RESULTS 


We generate the data for the defender-attacker-defender (DAD) network using 
Table 19, which are used for both (SMXM) and (GMXM). Once the problem data are 
generated, see Table 20, we hold them hxed for the analyses. For data e, penbal, and 
pencap, which are not data from the problem but are required for the formulation, 
we do not randomly generate their values. Instead, we £x their values, e = 0.001, 
penbal= 50, and pencap= 50. 


Data 

Random generators 

costsupply 

5xU(2,4), 5xU(10,13) 

costflow 

U(l,ll) 

demand 

5xU(2,3), 5xU(8,10) 

totalsorties 

U(20,22) 

ubsupply 

U(10,12) 

ubflow 

U(5,15) 

vul 

U(0.5,2) 

vulcap 

0.6 X ubflow 


Table 19. Random generators used to produce the DAD problem parameters in Table 
20. The phrase 5xU(2,4), 5xU(10,13) represents that a total of ten random numbers 
are generated, the first five are uniformly distributed between two and four, and the 
last five numbers are uniformly distributed between ten and 13. 

We generate 1,000 random attack plans, each attack plan has total sorties over 
the 18 arcs no greater than 20.7 sorties. Specifically, we generate 18 random numbers, 
each 11(0,20.7/4). If the sum of the 18 numbers is no greater than 20.7, we accept the 
set as an attack plan. We repeat until we accumulate 1,000 attack plans. The factor 
“4” in “20.7/4” is chosen empirically. 

An initial point wq = (0.5 x ubsupply, 0.5 x ubflow, 0.5 x ubflow ,...) is used for 
(SMXM). An initial point Wq = (0.5 x ubsupply, 0.5 x ubflow, 0.5 x ubflow, ...0.5 x 
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ubflow, 0,0,0) is used for (GMXM), where the string of O’s is for EXCESSSUPPLY 
and EXTRASUPPLY. 

Based on the 1,000 attack plans, the (SMXM) solution obtained from solving 
(FMXjv) is shown in Table 21, which states (i) the optimal supply distribution plan 
before the attack, (ii) the worst-case attack plan, and (hi) the optimal flow after the 
worst-case attack. The objective function value for this solution is 1,288. 

For (GMXM), the solution is shown in Table 22, with an objective function 
value of 891. Note that the objective functions for (SMXM) and (GMXM) are differ¬ 
ent, which explains the signihcantly different objective function values. 
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Table 22. Results for (GMXM). 
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